0% found this document useful (0 votes)

4 views

Lecture04

The document provides an overview of machine learning basics, focusing on key concepts such as sensitivity, specificity, accuracy, precision, and F1 score, which are essential for evaluating model performance. It also discusses feature engineering, including handling missing data and normalizing data, as well as the importance of training, testing, and validation sets. Additionally, it covers the concepts of overfitting and underfitting, and encourages familiarity with Python programming for practical applications.

Uploaded by

usamasulemanleghari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Lecture04

Uploaded by

usamasulemanleghari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Lecture

Machine Learning
04

Machine Learning Basics

Arslan Ali Khan

[email protected]
Department of Cyber-Security and Data Science
Riphah Institute of Systems Engineering (RISE),
Riphah International University, Islamabad, Pakistan.
Confusion Matrix 51
Sensitivity or Recall 51

• Sensitivity is a measure of how well a machine learning model can

detect positive instances. It is also known as the true positive rate
(TPR) or recall.
• In other words, sensitivity measures the proportion of actual positives
that are correctly identified as such (e.g., the percentage of sick
people who are correctly identified as having the condition).
• Sensitivity is used to evaluate model performance because it allows
us to see how many positive instances the model was able to
correctly identify.

Sensitivity = TP / TP+FN
Sensitivity = (True Positive/True Positive + False Negative)
Sensitivity or Recall 51

• Let’s consider an example of a medical test for a rare disease to

understand the concept of sensitivity. Suppose that the test has a
sensitivity of 95%. This means that if 100 people who have the
disease take the test, the test will correctly identify 95 of them as
positive, but it will miss 5 of them (false negatives).
• As shown above, A model with high sensitivity will have few false
negatives, which means that it is missing a few of the positive
instances.
Sensitivity or Recall 51

• In various use cases, it is important for the models to have high

sensitivity because we want our models to be able to find all of the
positive instances in order to make accurate predictions.
• The sum of sensitivity (true positive rate) and false negative rate
would be 1. The higher the true positive rate, the better the model is
in identifying the positive cases in the correct manner.
Specificity 51

• When sensitivity is used to evaluate model performance, it is often

compared to specificity. Specificity measures the proportion of true
negatives that are correctly identified by the model.
• This implies that there will be another proportion of actual negative
which got predicted as positive and could be termed as false
positives. This proportion could also be called a True Negative Rate
(TNR).

Specificity = TN / TN+FP
Specificity = (True Negative)/(True Negative + False Positive)
Specificity 51

• Let’s consider an example of a medical test for a rare disease.

Suppose that the test has a specificity of 95%. This means that if 100
people who do not have the disease take the test, the test will
correctly identify 95 of them as negative, but it will incorrectly identify
5 of them as positive (false positives).
• Thus, the specificity, in this case, can be defined as a measure of the
proportion of people not suffering from the disease who got predicted
correctly as the ones who are not suffering from the disease. In other
words, the proportion of person who is healthy actually got predicted
as healthy is specificity.
Specificity 51

• The sum of specificity (true negative rate) and false positive rate
would always be 1. High specificity means that the model is correctly
identifying most of the negative results, while a low specificity means
that the model is mislabeling a lot of negative results as positive.
Sensitivity vs Specificity 51

• Sensitivity: The ability of a test to correctly identify patients with a

disease.
• Specificity: The ability of a test to correctly identify people without
the disease
Accuracy 51

• Accuracy is a metric that measures how often a machine learning

model correctly predicts the outcome. You can calculate accuracy by
dividing the number of correct predictions by the total number of
predictions.
• In other words, accuracy answers the question: how often the model
is right?

Accuracy = TP + TN / TP+TN+FP+FN
Precision 51

• Precision is a metric that measures how often a machine learning

model correctly predicts the positive class. You can calculate
precision by dividing the number of correct positive predictions (true
positives) by the total number of instances the model predicted as
positive (both true and false positives).
• The precision is also known as positive predictive value.

Precision = TP / TP+FP
F1 score 51

• The F1 score or F-measure is described as the harmonic mean of the

precision and recall of a classification model. The two metrics
contribute equally to the score, ensuring that the F1 metric correctly
indicates the reliability of a model.
F1=2∗((precision∗recall)/(precision+recall))
where precision is the number of true positives divided by the sum of
true positives and false positives, and recall is the number of true
positives divided by the sum of true positives and false negatives.
F1 score 51

• The F1 score or F-measure is described as the harmonic mean of the

precision and recall of a classification model. The two metrics
contribute equally to the score, ensuring that the F1 metric correctly
indicates the reliability of a model.
Confusion Matrix-Example
Mathematical Concepts (Self Study)

• Scalar • Random Variable

• Vector • Probability Distribution
• Matrix • Probability Mass Function
• Norm • Probability Density Function
• Eigen Decomposition
• Singular Value Decomposition
Feature Engineering

• Dealing with Missing Data

Missing values are data points that are absent for a specific variable in a
dataset. They can be represented in various ways, such as blank cells,
null values, or special symbols like “NA” or “unknown.” These missing
data points pose a significant challenge in data analysis and can lead to
inaccurate or biased results.
Feature Engineering

• Dealing with Missing Data

Missing values can pose a significant challenge in data analysis, as they can:
• Reduce the sample size: This can decrease the accuracy and reliability
of your analysis.
• Introduce bias: If the missing data is not handled properly, it can bias
the results of your analysis.
• Make it difficult to perform certain analysis: Some statistical
techniques require complete data for all variables, making them
inapplicable when missing values are present
Feature Engineering

• Dealing with Missing Data

• Replacing missing values with estimated values.

• Preserves sample size: Doesn’t reduce data points.

• Can introduce bias: Estimated values might not be accurate.

Use of Mean, Median, and Mode:

• Replace missing values with the mean, median, or mode of the relevant variable.

• Simple and efficient: Easy to implement.

• Can be inaccurate: Doesn’t consider the relationships between variables.

Feature Engineering

• Handling Categorical Data

Categorical data is data that can be divided into groups or categories,
such as gender, hair color, or product type.
Feature Engineering

• Normalizing Data
Normalization in machine learning is the process of translating data into
the range [0, 1] (or any other range).
• Feature Construction or Generation
Feature Generation (also known as feature construction, feature
extraction or feature engineering) is the process of transforming features
into new features that better relate to the target. This can involve
mapping a feature into a new feature using a function like log, or
creating a new feature from one or multiple features using multiplication
or addition.
Feature Scaling 56

A technique often applied as part of data preparation for machine learning.

Goal: Change the values of numeric columns in the dataset to a common scale, without
distorting differences in the ranges of values.

Normalization
Min-max normalization: Guarantees all features will have the exact same scale but does
not handle outliers well.

Z-score standardization: Handles outliers, but does not produce normalized data with the
exact same scale.
Training, Testing and Validation Sets 57
Training, Testing and Validation Set 58
K-Fold Cross Validation 59

K-fold cross-validation is a
technique for evaluating
predictive models.

The dataset is divided into k

subsets or folds. The model is
trained and evaluated k times,
using a different fold as the
validation set each time.

Performance metrics from each

fold are averaged to estimate the
model's generalization
performance.
K-Fold Cross Validation 60
Under-fitting and Over-fitting 61

• Overfitting occurs when the model fits the training data too well and does not
Overfitting generalize so it performs badly on the test data.
• Its the result of an excessively complic ated model.

• Underfitting occurs when the model does not fit the data well enough.
Underfitting • Is result of an excessively simple model.
Under-fitting and Over-fitting 62

• Both overfitting and underfitting lead to poor predictions on new datasets.

• A learning model that overfits or underfits does not generalize well.

About Python

Familiarize yourself with Python Programming this week.

Python 64

• Install Anaconda Navigator https://www.anaconda.com/products/individual

Python 65

Environments and Libraries

Notebook Pandas
Qtconsole Scipy
Orange Matplotlib
Vscode Sklearn
PyCharm Numpy
Python Exercises to solve this week 66

• https://pynative.com/python-exercises-with-solutions/

• https://www.w3resource.com/machine-learning/scikit-learn/iris/index.php

• https://www.practicepython.org/
Reading Task for this week

Relevant sections from Chapter 2 of Text Book

Reading Task for this week
Part I: Understanding Machine Learning Chapter 2 and 3
Chapter 1 and 2

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Understanding Brake Pads
100% (2)
Understanding Brake Pads
14 pages
Case Study
No ratings yet
Case Study
4 pages
performance evaluation
No ratings yet
performance evaluation
24 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Data Science Statistics Mathematics Cheat Sheet
100% (1)
Data Science Statistics Mathematics Cheat Sheet
13 pages
Ch01_ICS422_03
No ratings yet
Ch01_ICS422_03
46 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
UNIT-3
No ratings yet
UNIT-3
13 pages
Data Mining: Accuracy and Error Measures For Classification and Prediction
No ratings yet
Data Mining: Accuracy and Error Measures For Classification and Prediction
15 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Lecture - Model Accuracy Measures
No ratings yet
Lecture - Model Accuracy Measures
61 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Lecture-(3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture-(3-4) Evaluation Metrices Classification and Regression
28 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Module 2
No ratings yet
Module 2
72 pages
6 Evaluarea performantei
No ratings yet
6 Evaluarea performantei
43 pages
C10 - Evaluating Model Performance - v1
No ratings yet
C10 - Evaluating Model Performance - v1
20 pages
Performance Measures
No ratings yet
Performance Measures
9 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
ML CH 5
No ratings yet
ML CH 5
5 pages
9__ROC__AUC
No ratings yet
9__ROC__AUC
27 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Lect_02_Evaluation_Part_1
No ratings yet
Lect_02_Evaluation_Part_1
33 pages
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
No ratings yet
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
5 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Model Perf Cheat Sheet
No ratings yet
Model Perf Cheat Sheet
2 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Model Validation and Perf Metrics - v2 - Noman - 08 - 06 - 24
No ratings yet
Model Validation and Perf Metrics - v2 - Noman - 08 - 06 - 24
25 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
chapter 5 Model Evaluation
No ratings yet
chapter 5 Model Evaluation
21 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
ADS 5
No ratings yet
ADS 5
5 pages
ML Interview Questions placements
No ratings yet
ML Interview Questions placements
99 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
ML MAKAUT unit-3
No ratings yet
ML MAKAUT unit-3
6 pages
10.2. Accuracy and Quality Measurements
No ratings yet
10.2. Accuracy and Quality Measurements
55 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Unit6 -7 Issues_23bc7150-918a-4ebe-9af6-01db96af986a
No ratings yet
Unit6 -7 Issues_23bc7150-918a-4ebe-9af6-01db96af986a
53 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
22 pages
FDS_notes
No ratings yet
FDS_notes
6 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
21-General approach to classification, classification by decision tree induction-17-02-2025
No ratings yet
21-General approach to classification, classification by decision tree induction-17-02-2025
15 pages
DS Notes
No ratings yet
DS Notes
36 pages
Lec 8
No ratings yet
Lec 8
35 pages
Model Evaluation Parameters
No ratings yet
Model Evaluation Parameters
31 pages
Gerbera
No ratings yet
Gerbera
32 pages
The-Mont-Bistro-Menu
No ratings yet
The-Mont-Bistro-Menu
2 pages
Dubai Duty Free
No ratings yet
Dubai Duty Free
68 pages
Reconstructing The History of The Earth
No ratings yet
Reconstructing The History of The Earth
16 pages
Module 11
No ratings yet
Module 11
19 pages
FKV Catalog
No ratings yet
FKV Catalog
8 pages
Bulimia Nervosa
No ratings yet
Bulimia Nervosa
11 pages
Toshiba 46tl968g Ver.1.00
No ratings yet
Toshiba 46tl968g Ver.1.00
125 pages
Aktham Cv 2024-3
No ratings yet
Aktham Cv 2024-3
3 pages
Manual Combat Diving
100% (2)
Manual Combat Diving
153 pages
Making A Hay Infusion
No ratings yet
Making A Hay Infusion
1 page
"Classic Range" Aluminium Connectors - 63/220 KV: Simel Products
No ratings yet
"Classic Range" Aluminium Connectors - 63/220 KV: Simel Products
4 pages
Psycho Pictography The New Way to Use the Miracle Power of Your Mind Vernon Howard instant download
100% (2)
Psycho Pictography The New Way to Use the Miracle Power of Your Mind Vernon Howard instant download
64 pages
Unit-5- SOFT SKILLS
No ratings yet
Unit-5- SOFT SKILLS
11 pages
GSC517 User Manual
No ratings yet
GSC517 User Manual
12 pages
B30 Bus Differential System: Grid Solutions
No ratings yet
B30 Bus Differential System: Grid Solutions
626 pages
References Qualitative Research PDF
No ratings yet
References Qualitative Research PDF
5 pages
CAT
No ratings yet
CAT
45 pages
Ordinance Template Promoting Active Mobility Transport
100% (1)
Ordinance Template Promoting Active Mobility Transport
11 pages
Brochure Iqfvjl Iil PPRC Brochure Compressed
No ratings yet
Brochure Iqfvjl Iil PPRC Brochure Compressed
25 pages
Fabrication and Optical Characterization of VO2-Based Thin
No ratings yet
Fabrication and Optical Characterization of VO2-Based Thin
16 pages
Sotck Check List
No ratings yet
Sotck Check List
2 pages
Ceiling Works - Gypsum Plaster Board
No ratings yet
Ceiling Works - Gypsum Plaster Board
5 pages
Unit 3 Ppt Counseling Skills Ethical Considerations
No ratings yet
Unit 3 Ppt Counseling Skills Ethical Considerations
55 pages
Unpacking The Self
No ratings yet
Unpacking The Self
21 pages
Erico Genel Katalog PDF
No ratings yet
Erico Genel Katalog PDF
84 pages
Science 7 - Summative Test - Q2 - Week 1-Week 4 - SY 2021-2022
No ratings yet
Science 7 - Summative Test - Q2 - Week 1-Week 4 - SY 2021-2022
2 pages
1998 Gmt-98 Ck-2 Service Manual-Volume 2 of 4
No ratings yet
1998 Gmt-98 Ck-2 Service Manual-Volume 2 of 4
887 pages