0% found this document useful (0 votes)

62 views21 pages

Fiches Machine Learning

This document provides an overview of machine learning concepts including supervised vs unsupervised learning, classification vs regression problems, and modeling techniques like linear regression. It discusses modeling processes such as data preprocessing, model training/tuning, and evaluation. Clustering algorithms like k-means are introduced, as well as metrics to measure clustering quality. Linear regression fundamentals such as hypothesis testing, confidence intervals, and R-squared are also summarized.

Uploaded by

Rhysand Re

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views21 pages

Fiches Machine Learning

Uploaded by

Rhysand Re

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Machine Learning with Python

Application:
Assumptions: predict the future based on the past

Data availability: you cannot build a model without data

Modeling technique: big/small dataset

Result: reliability/interpretation

Supervised & Unsupervised learning

Supervised & Unsupervised learning
The main difference is on the output

- A supervised models uses labeled data to help predict outcomes

- An unsupervised model does not

Supervised
Supervised learning is a machine learning approach that’s defined by its use of labeled datasets.

These datasets are designed to train or ‘supervise’ algorithms into classifying data or predicting
outcomes accurately.

- The model can measure its accuracy and learn over time
- Two types of supervised problems: classification and regression
Classification problems use an algorithm to assign test data into specific categories, such as separating
apples from oranges.

Regression models use an algorithm to predict numerical values based on different data points, such as
sales revenue projections for a given business.

Classification models : ▪ Logistic Regression ▪ K-Nearest Neighbours ▪ Support Vector Machines ▪ Kernel
SVM ▪ Naïve Bayes ▪ Decision Tree classifier ▪ Random Forests classifier ▪ Neural Network classifier

Regression models: • Simple Linear Regression • Multiple Linear Regression • Polynomial Regression •
Support Vector Regression • Decision Tree • Random Forests • Neural Network

Unsupervised
Unsupervised learning uses machine learning algorithms to analyse and cluster unlabeled data sets.
These algorithms discover hidden patterns in data without the need for human intervention (hence,
they are “unsupervised”).

Unsupervised learning models are used for three main tasks:

• Clustering: for grouping unlabeled data based on their similarities or differences e.g., K-means
clustering algorithms Frequently used for market segmentation, image compression, etc.

• Association: uses different rules to find relationships between variables in a given dataset. These
methods are frequently used for market basket analysis and recommendation engines, along the lines of
“Customers Who Bought This Item Also Bought” recommendations.

• Dimensionality reduction: a learning technique used when the number of features (or instances) in a
given dataset is too high. It reduces the input data to a manageable size while also preserving the data
integrity. Often, this technique is used in the preprocessing data stage, such as when autoencoders
remove noise from visual data to improve picture quality.
Defining your problem and variables

Continuous variable: Any number (bw interval), real numbers ℛ

Discrete variable: Finite number of possible outcomes, natural (counting) numbers

Output numerical: regression VS Output categorical: classification

The Modelling Process

A general data mining process
Raw data: (Selection, Preprocessing, Transformation)

Data: Data mining

Patterns: Evaluation, Interpretation

Knowledge

A general modelling process

Given Data

Model Training (estimating parameters)

Tuning parameters

Model evaluation

Model selection / Model comparison

Modelling process = training, validation and test

Descriptive analysis
Visualisation
Boxplots

Clustering
Supervised Vs Unsupervised
Predictive Model: 𝑦 = 𝑓(𝑋1 ,𝑋2 , … , 𝑋𝑝)

• The majority of practical machine learning uses supervised learning.

• Supervised learning is where you have input data (X) and output data (y) and you use an algorithm to
learn the mapping function 𝑓 from the input to the output.

• Classification when 𝑦 is categorical Regression when 𝑦 is continuous

• E.g. linear regression, random forest, neural network, support vector machine…

▪ Unsupervised learning is where you only have input data (X) and NO corresponding output data.

▪ The goal for unsupervised learning is to model the underlying structure/pattern in the data.

▪ Clustering to discover the inherent groupings, e.g. customer segmentation

▪ Association to discover rules that describe large portions of your data, e.g. people that buy A also tend
to buy B.
▪ K-means, Apriori…

Distance functions and clustering

Clustering identifies clusters within the dataset of similar items. We use distance measures to create
clusters that can quantify how far away observations are from each other.

This distance sums up the straight distances of the dimensions along their axis.

▪ Clustering methods can only apply to numerical variables.

▪ For Categorical, convert to numeric data/use similarity through overlap (how many values overlap
between two observations) → not recommended

K-means clustering algorithm

1. It starts initially by randomly selecting 𝑘 different observations to act as centroids for the 𝑘 clusters
we want to find.

2. Next, assign each observation to the cluster which the centroid is the closest to the observation in
terms of the Euclidean distance.

3. Then, re-calculates the centroid of the cluster by taking the multidimensional mean of all
observations in the current cluster.

4. Repeat Step 2 and Step 3, until maximum iterations has reached, or the clusters don’t change
anymore.

We cannot know how many clusters 𝑘 in priori. ➔ try different values

• In practice, it is computationally fast, but may suffer the ‘curse of dimensionality’ when the number of
dimensions are high.

• The algorithm does not guarantee convergence to the global optimum.

• The result may depend on the initial clusters. Can be either be random or based on dissimilarity to
achieve faster convergence.

• Variants of the k-means algorithm: k-medians algorithm, algorithm based on Manhattan distance,…

How to measure the clustering quality? elbow method

inertia, or within-cluster sum-of-squares criterion
the distance of each observation to its closest centroid
Regression

Simple linear regression

A univariate linear regression model is to find a line in the two-dimensional space created by the single
independent and dependent variable, which is closest to all data points.

Slope*

Varying the values 𝛽0 and 𝛽1 obtain different distances of the observations to the curve

The goal of fitting a linear regression model is to estimate the two parameters, the slope 𝛽1 and the
intercept 𝛽0 , that minimizes the overall distance.

Since we estimate 𝛽0and 𝛽1 , we write the estimated regression

The values y1^ of the response variable returned by the estimated regression line are called the
predicted (or fitted) values.

In general, 𝑦1^ differ from the actual observed values for the response (𝑦𝑖 ) and the difference between
the two are called residuals

𝑒𝑖 = 𝑦𝑖 – 𝑦i^

We identify as the (estimated) linear relationship between 𝑦 and 𝑥 the particular line that fits the data at
best. The approach usually adopted to find the “best-fit” line is called the least squares method.

The idea of least squares: among all possible lines that pass through the points in the scatterplot, the
best one is the line that minimizes the sum of squared residuals (which represents a measure of the
overall prediction error):
Hypothesis testing
The null hypothesis 𝐻0: 𝛽1 = 0 VS the alternative hypothesis H1: 𝛽1 ≠ 0

• Calculate some statistics and corresponding p-value

• P-values are used in hypothesis testing to help decide whether to reject the null hypothesis.

• The smaller the p-value, the more likely you are to reject the null hypothesis.

• The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of
the time under the null hypothesis.

• If p < 0.05, we reject the Null hypothesis at the 5% significance level

t and P>|t|:

They tell us something about whether or not the relationship between the predictor and the response is
significant.

A hypothesis test: the null hypothesis 𝑯𝟎: 𝜷𝟏 = 𝟎 VS the alternative hypothesis 𝐇𝟏: 𝜷𝟏 ≠ 𝟎
▪ To test the null hypothesis, we need to determine if our estimate ෢𝛽1 is sufficiently far from zero that
we can be confident that 𝛽1 is non-zero.

▪ To test this hypothesis, linear regression performs a t-test and the outputs of this test are the tstatistic
(t) and the p-value (P>|t|).

▪ If the t-statistic is very large, the alternative hypothesis is likely to be true. A basic rule to reject the null
hypothesis in favour of the alternative hypothesis is when the p-value is smaller than 0.05.

This would mean that the probability of observing such an extreme value for 𝛽1 is 5%, given that the
null hypothesis is true. Hence, it is very unlikely that the null hypothesis is true and we can say that the
predictor has a significant influence on the 5% significance level.

95% Confidence Interval [0.025 and 0.975]:

Strictly speaking a 95% confidence interval means that if we were to take 100 different samples
(datasets) and compute a 95% confidence interval from each sample, then approximately 95 of the 100
confidence intervals will contain the ‘true’ value.

The R-squared (R 2 ) is a measure of goodness-of-fit and shows us the explanatory power of our model.
R 2 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆 / 𝑇𝑆𝑆 = 1 − 𝑅𝑆𝑆 /𝑇𝑆𝑆

RSS represents the residual sum of squares 𝑅𝑆𝑆 = 𝑆𝑆𝑅 = sum( 𝑦𝑖 – y𝑖^)^2 -- variability that is
unexplained by the model.

TSS stands for the total sum of squares and is defined as

variability in the response before the model is built.

The R^2 indicates how much variance in the response variable (Y) is explained by the model.

• The R^2 is a number between 0 and 1. The closer to 0 (1) means that the model does not explain
(explains) a lot of the variability in the response.

• A low value for the R 2 → the relationship might not be linear.

• In simple linear regression, the R^2 equals the squared correlation between predictor and response

Holdout data
Training data

• Takes care to model

• The other part of the data is ignored to build the model

Test data, external, holdout data

• Evaluate the performance of the model

• Compare with other models

• Model validation
Multiple Linear Regression
Multiple linear regression n enables you to predict a continuous response using two or more continuous
or discrete predictors.

Recall that the simple linear regression: 𝑌 = 𝛽0 + 𝛽1𝑋 + 𝜖

The basic equation that defines the multiple linear regression (MLR) model is

𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝 + 𝜖

• The structural part of the model 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝

• The error part of the model 𝜖

• The j coefficients 𝛽𝑗 are called the partial slopes of the response variable with respect to the j-th
predictor 𝑥𝑗.

Estimation of 𝛽𝑗 ’s is still performed using least squares and all the quantities we introduced in simple
linear regression model are used in multiple regression in a similar way.

Specifically, the principle to find the optimal coefficient estimates remains the same as in simple linear
regression, namely minimising the residual sum of squares (RSS):

Once we have the estimates of the coefficient, we can make a prediction for a single instance 𝒙𝒊 = 𝑥𝑖1,
𝑥𝑖2, … , 𝑥𝑖𝑝 using the equation

In multiple linear regression, the coefficient 𝛽^ 𝑗 would represent the average effect of one unit increase
in 𝑥𝑗 while keeping all other predictors fixed.
MLR: OLS
SLR: box office sales depend on advertising spending

MLR: box office sales depend on economic growth and advertising spending 𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 +
epsilon

MLR OLS solution will choose the plane that is as close as possible to the data points, so it will minimise
the squared vertical distances between each observation and the plane.

Interpretation
The coefficients are different compared to SLR.

The coefficient of advertising is saying that, with one unit increase in spending, the visitor count would
increase by 1.54 on average, while keeping economic growth constant.

The OLS procedure and corresponding interpretation can easily be extended to 3,4, or 𝑝 predicators.

Assumptions
𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝 + epsilon

Assumption 1: linear regression assumes that there is a linear relationship between predictor and
response.
To test whether or not there is a linear relationship between a predictor and a response, you can make a
scatterplot between each predictor and the response. If the relationship is linear, the outcome variable
will linearly increase or decrease in terms of the response. Another option is to make a residual plot.
This plot depicts the residuals against the fitted values. If there is a linear relationship, the points should
be randomly distributed around the horizontal line.

Assumption 2: Multiple linear regression assumes that there is NO multicollinearity in the data.

This means that the predictors cannot be too closely related to each other or, in others words, that the
correlation between the predictors is not too high. The easiest, and most effective, way to deal with
multicollinearity is to delete highly correlated variables.

Two ways to check for multicollinearity:

1. calculate the Pearson correlation matrix among all predictors. This correlation matrix shows you the
correlation between all predictors and a rule of thumb is that the correlation between two predictors
should be smaller than 0.80.

2. A second way is to calculate the Variance Inflation Factor (VIF). The VIF for the j-th predictor in a
linear regression model with p predictors is defined as

𝑉𝐼𝐹𝑗 = 1 /1 – 𝑅𝑗^2

where 𝑅𝑗^2 is the R-squared value for the (auxiliary) regression of the j-th predictor versus all the
remaining ones.

The VIF provides a measure of the reduction in the precision of a coefficient estimate. VIF is a number
greater than or equal to 1.

• When it is equal to 1, it means that the predictor is not affected by any collinearity problem.

• The larger it is, the stronger is the association of that predictor with all the other ones in the model. •
If the VIF of a predictor is larger than 10, we flag the predictor as affected by a severe collinearity
problem.

Assumption 3: Multivariate normality

Multiple regression assumes that the residuals are normally distributed.

• Tests for Normality

- Q-Q-plots or simple histograms.

- Goodness-of-fit test, e.g., a Kolmogorov-Smirnov test on the residuals. The null hypothesis of K-S
test is that the data is normally distributed.
• If there is no multivariate normality, then a transformation of the variables (e.g., a logarithmic
transformation) is advised.

Assumption 4: residuals are independent

• In other words, there is no autocorrelation between the error terms. Independent means that the
error term of a certain observation i does not say anything about the error term of observation j.

• Autocorrelation can be tested with a scatterplot or with an autocorrelation test, e.g. Durbin-
Watson test. The null hypothesis for this test is that there is no linear autocorrelation between the
error terms.

Assumption 5: homoscedasticity

Constant variance of the error terms across all observations.

• Can be identified with a residual plot (Residual vs Fitted)

• If the error terms are homoscedastic, we see a chaotic scatterplot of the error terms with no real
relationship. We don’t notice a changing spread of the error terms, instead the spread of the error
terms is constant and we have a constant variance.

• If there is no homoscedasticity (or heteroscedasticity), we see a pattern in the scatter plot. Often
this pattern has a funnel shape and this represents a changing spread of the error terms.
Why feature selection?
Many Machine Learning problems involve thousands or even millions of features for each training
instance. Not only does this make training extremely slow, it can also make it much harder to find a
good solution. This problem is often referred to as the curse of dimensionality.

• Sparsity of data occurs when moving to higher dimensions, which leads to weak statistical significance.
– need more data

• The model parameters to be estimated and training time increase as number of features increases. –
computational burden

Garbage in, garbage out

Facing largre dataset, we want to reduce the dimensionality in the training dataset, and to come up with
a good set of features to train on. This process, called feature engineering, involves:

• Feature selection: selecting the most useful features to train on among existing features.

• Feature extraction: combining existing features to produce a more useful one.

• Creating new features by gathering new data

Feature Selection Techniques

Three types of FS techniques

1. Filter methods

To find a single measure that relates each independent variable to the dependent variable, which
can be used to score the importance of the independent variable, e.g. Correlation. The outcome can
be used for ranking. computationally fast and model-free

2. Wrapper methods
to generate a vast number of models with various subsets of independent variables to check which
subset give the best performance. Advantages: testing various combinations of predictors;
interaction taken into account Drawback: Model specific and computationally expensive.

3. Embedded methods

Some algorithms have built-in feature selection thanks to their way of modelling, e.g. Random
Forest and Lasso regression.

Embedded method - standardized coefficients

𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝 + epsilon

Standardized (regression) coefficients (beta coefficients or beta weights)

where 𝑠𝑖 and 𝑠𝑌 are the (estimated) standard deviations of 𝑋𝑖 and 𝑌, respectively 𝛽𝑖 ∗ refer to how
many standard deviations a dependent variable will change, per standard deviation increase in the
predictor variable The coefficients 𝛽𝑖 ∗ are independent of the involved variables' units of measurement
(unitless).

Filter Method – Correlation Coefficient

The most widely known is the correlation coefficient, more precisely, the Pearson correlation coefficient
𝑟:

𝑟 ∈ [−1,1]: 1 (-1) for perfect positive (negative) correlation

It captures the strength of the linear relationship between 𝑥 and 𝑦

Filter Method – Rank correlation coefficient

The Spearman’s Rank correlation coefficient 𝑅:

where 𝑑𝑖 = 𝑅 𝑥𝑖 − 𝑅(𝑦𝑖 ) is the difference in ranks between the rank of each observation

𝑅 𝑥𝑖 and 𝑅(𝑦𝑖 )

• Non-parametric
• Test for linear relationships

• Suitable for both ordinal and continuous variables

Correlation vs Causation
Correlation may not necessarily leads to causality. One variable’s influence might be overshadowed by
the others.

Data Analyst Interview Question and Answer
No ratings yet
Data Analyst Interview Question and Answer
51 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Unit V - Big Data Programming
No ratings yet
Unit V - Big Data Programming
22 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
AIML
No ratings yet
AIML
30 pages
Module 5
No ratings yet
Module 5
48 pages
Module 3
No ratings yet
Module 3
63 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Unit 3
No ratings yet
Unit 3
12 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
R Data Analysis
No ratings yet
R Data Analysis
10 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
PWC
No ratings yet
PWC
24 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Session 5
No ratings yet
Session 5
36 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
Module 2 Modified
No ratings yet
Module 2 Modified
67 pages
Ai ML 3
No ratings yet
Ai ML 3
27 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
ML Unit3b
No ratings yet
ML Unit3b
175 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
ML Notes
No ratings yet
ML Notes
38 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Slide 1
No ratings yet
Slide 1
29 pages
Accenture
No ratings yet
Accenture
3 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
UNIT-2 Material
No ratings yet
UNIT-2 Material
71 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Digital Signal Process
No ratings yet
Digital Signal Process
36 pages
Bitirme Presentation
No ratings yet
Bitirme Presentation
18 pages
Unit 5 Qb-Numerical Methods-2 - II Sem 2022-23
No ratings yet
Unit 5 Qb-Numerical Methods-2 - II Sem 2022-23
6 pages
Digital Communication Systems by Simon Haykin-19
No ratings yet
Digital Communication Systems by Simon Haykin-19
4 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
PPA Unit-4
No ratings yet
PPA Unit-4
48 pages
WEEK 7 MODULE 7 - Analysis of Discrete Systems2
No ratings yet
WEEK 7 MODULE 7 - Analysis of Discrete Systems2
49 pages
TOAE201-Slides-Chapter 3. Random Variables
No ratings yet
TOAE201-Slides-Chapter 3. Random Variables
45 pages
Physics-Informed Neural Networks Applied To Space Guidance
No ratings yet
Physics-Informed Neural Networks Applied To Space Guidance
21 pages
271 Peeyush
No ratings yet
271 Peeyush
15 pages
LOGAN - Membership Inference Attacks
No ratings yet
LOGAN - Membership Inference Attacks
20 pages
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
No ratings yet
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
17 pages
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow's Intelligent Network Traffic Control Systems
No ratings yet
State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow's Intelligent Network Traffic Control Systems
24 pages
KNN Presentation
No ratings yet
KNN Presentation
19 pages
A New Study of Trapezoidal, Simpson's1/3 and Simpson's 3/8 Rules of Numerical Integral Problems.
No ratings yet
A New Study of Trapezoidal, Simpson's1/3 and Simpson's 3/8 Rules of Numerical Integral Problems.
13 pages
Iobm MBA Logistics & Supply Chain
No ratings yet
Iobm MBA Logistics & Supply Chain
10 pages
Modelo Estrutural 1
No ratings yet
Modelo Estrutural 1
558 pages
Preliminary Laboratory Examination - Attempt Review
No ratings yet
Preliminary Laboratory Examination - Attempt Review
8 pages
Analyze Data in Excel
No ratings yet
Analyze Data in Excel
18 pages
Hillier and Lieberman Problem 14.4-2 Page 746
No ratings yet
Hillier and Lieberman Problem 14.4-2 Page 746
27 pages
Artificial Intelligence Reading Comprehension
100% (2)
Artificial Intelligence Reading Comprehension
2 pages
Pushing The Limit of LLM Capacity For Text Classification
No ratings yet
Pushing The Limit of LLM Capacity For Text Classification
12 pages
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
No ratings yet
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
28 pages
Integer Programming by Cutting Planes Methods
No ratings yet
Integer Programming by Cutting Planes Methods
58 pages
Problem Set 1
No ratings yet
Problem Set 1
4 pages
Local Adversarial Search
No ratings yet
Local Adversarial Search
44 pages
E0234 PPT
No ratings yet
E0234 PPT
41 pages
Btech Ec 7 Sem Digital Image Processing Nec 032 2017 18
No ratings yet
Btech Ec 7 Sem Digital Image Processing Nec 032 2017 18
2 pages

Uploaded by

Uploaded by

Machine Learning with Python

Data availability: you cannot build a model without data

Modeling technique: big/small dataset

Supervised & Unsupervised learning

- A supervised models uses labeled data to help predict outcomes

Unsupervised learning models are used for three main tasks:

Continuous variable: Any number (bw interval), real numbers ℛ

Discrete variable: Finite number of possible outcomes, natural (counting) numbers

Output numerical: regression VS Output categorical: classification

The Modelling Process

Data: Data mining

Patterns: Evaluation, Interpretation

A general modelling process

Model Training (estimating parameters)

Model selection / Model comparison

Modelling process = training, validation and test

• The majority of practical machine learning uses supervised learning.

• Classification when 𝑦 is categorical Regression when 𝑦 is continuous

▪ Clustering to discover the inherent groupings, e.g. customer segmentation

Distance functions and clustering

▪ Clustering methods can only apply to numerical variables.

K-means clustering algorithm

We cannot know how many clusters 𝑘 in priori. ➔ try different values

• The algorithm does not guarantee convergence to the global optimum.

How to measure the clustering quality? elbow method

Simple linear regression

Since we estimate 𝛽0and 𝛽1 , we write the estimated regression

• Calculate some statistics and corresponding p-value

• If p < 0.05, we reject the Null hypothesis at the 5% significance level

95% Confidence Interval [0.025 and 0.975]:

TSS stands for the total sum of squares and is defined as

variability in the response before the model is built.

• A low value for the R 2 → the relationship might not be linear.

• Takes care to model

• The other part of the data is ignored to build the model

Test data, external, holdout data

• Evaluate the performance of the model

• Compare with other models

Recall that the simple linear regression: 𝑌 = 𝛽0 + 𝛽1𝑋 + 𝜖

𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝 + 𝜖

• The structural part of the model 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝

• The error part of the model 𝜖

Two ways to check for multicollinearity:

Assumption 3: Multivariate normality

Multiple regression assumes that the residuals are normally distributed.

• Tests for Normality

- Q-Q-plots or simple histograms.

Assumption 4: residuals are independent

Constant variance of the error terms across all observations.

• Can be identified with a residual plot (Residual vs Fitted)

Garbage in, garbage out

• Feature extraction: combining existing features to produce a more useful one.

• Creating new features by gathering new data

Feature Selection Techniques

Embedded method - standardized coefficients

Standardized (regression) coefficients (beta coefficients or beta weights)

Filter Method – Correlation Coefficient

𝑟 ∈ [−1,1]: 1 (-1) for perfect positive (negative) correlation

It captures the strength of the linear relationship between 𝑥 and 𝑦

Filter Method – Rank correlation coefficient

• Suitable for both ordinal and continuous variables

You might also like