0% found this document useful (0 votes)

30 views

Lecture 13 - Unsupervised Learning, PCA ICA

Uploaded by

kateryna.koval

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Lecture 13 - Unsupervised Learning, PCA ICA

Uploaded by

kateryna.koval

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Generative

Unsupervised Learning
Agenda
1. Unsupervised learning
2. Clustering
a) K-means clustering
b) Hierarchical clustering
c) Mean shift clustering
d) DBSCAN
e) Clustering metrics
3. Anomaly Detection
a) One-class support vector machine
b) Isolation forest
4. Dimensionality reduction
a) Principal component analysis (PCA)
b) Independent component analysis (ICA)
Unsupervised Learning

Unsupervised learning Supervised learning

No labels Labels are given

Find hidden structure in the data Learn a mapping from X to y, based
on train dataset
Main algorithms:
• clustering Main algorithms:
• dimensionality reduction • classification
• anomaly detection • regression
Training and Validation

Historical data Data cleaning and ML algorithms Patterns and rules Validation on real
preparation data

Inference

Historical data Data cleaning and Exploitation of Real data

preparation algorithms and rules grouping
Clustering
Main task: determine distinct groups (clusters), putting similar data into one group.
Clustering should satisfy the following conditions:
- points inside a cluster should be similar - high inter-class cluster similarity;
- points from different clusters should be far away from each other - low intra-
class cluster similarity.

What’s the natural grouping between the data points?

What’s the best way for grouping points?
What is the number of clusters? Can it be determined automatically?
How to avoid trivial clusters?
Whether to allow very big or very small clusters?
What metrics are used for clustering evaluation?
Clustering methods taxonomy
Distance between points

One way to find similar points - define a distance function between points.
The following metrics are often used:

- Euclidean distance
- Manhattan distance
- Minkowski distance
- Jaccard similarity index (for categorical data)
- Cosine similarity
K-means
1. Define the number of clusters 𝑘.
2. Randomly initialize 𝑘 centroids.
3. Calculate euclidean distance between
centroids and data points.
4. Group data points into clusters around
centroids.
5. Recalculate centroids. Let each centroid
be a mean of data points in a
corresponding cluster.
6. Repeat steps 3-5 until convergence.
K-means problems
- one should know a number of clusters a priori
- method isn’t robust to clusters of different size, density and shape.
- method isn’t robust to outliers
- inapplicable to large/high dimensional datasets
Hierarchical clustering

Repeat steps 1-2, until termination Data is split into two parts
3 condition is met
1
2 The closest points/clusters are merged
2 The best splitting is determined

1 Each point is considered as a separate

cluster
3 Repeat steps 1-2, until termination
condition is met

❑ predefined cluster number is achieved

Termination ❑ all points are merged into a single cluster ❑ each point forms a cluster
conditions ❑ distance between the closest clusters is bigger ❑ maximum distance between any partitions of a
than predefined threshold cluster is smaller than a predefined threshold.
Hierarchical clustering
Agglomerative clustering represents a hierarchy of clusters in a tree form -
dendrogram.

Root of the dendrogram – a single cluster, which contains all data points.
Each leaf is a separate point.
This model can be easily interpreted and shows similarity between
different data points.
Final clusters are obtained by pruning the dendrogram at some level. Thus,
one can choose the number of clusters to use.
Example

Multi-dimensional visualizations for clustering

DBSCAN: density-based spatial clustering
of applications with noise
Suitable for data with large number of groups that intersect.

Previous algorithms were based on distance function: close in distance points were put in the
same cluster. In case of scattered data, algorithms, that are based only on distance, are tend to
perform badly and consider all points as clusters.

DBSCAN is robust to outliers and can be also considered as an anomaly detection algorithm.

The algorithm can handle arbitrary cluster shapes and doesn’t require to specify number of
clusters a priori.

Parameters:
ε – the maximum neighborhood radius
n – the minimum number of points to form a region
DBSCAN algorithm
1. For current point calculate the distance to all other points.
2. Mark all points, that are in epsilon-neighborhood of a
current point, as neighbors.
3. If number of neighbors is greater or equal to n, merge
points to form a cluster.
4. If number of neighbors is smaller than n - mark point as an
outlier.
5. Continue for other points, until all are marked as outliers or
form a cluster.
Mean Shift Clustering
Non-parametric feature-space analysis
technique for locating the maxima of a
density function.
Alternative name - mode-seeking algorithm.

1. Define the kernel size and move a window on a current point.

2. Calculate a centroid over all points, that are located inside a window.
3. Move a window to centroid.
4. Repeat steps 1-3, until convergence.

More information here.

PROS Cons

• Window size can

• Doesn’t require to specify number
significantly affect the
of clusters a priori
performance. Wrong size can
• Can handle various cluster shapes merge the modes or create
multiple false positive
• Can handle various feature spaces
modes.
• Robust to outliers
• Сomputationally expensive

OpenCV: cv.meanShift
Sklearn: sklearn.cluster.MeanShift
How to choose number of clusters?

Do not forget that clusters are, in large part, on the

eye of the beholder
(Estivill-Castro, 2002)

Clusters number is chosen:

❑ based on metrics, so that, clusters explain the variance in the data.
❑ intuitively or based on prior knowledge (for example, from
customer)
Clustering Metrics
Elbow rule
1. Group data on 2, 3, ..., n clusters.
2. For each cluster calculate the average
distance to the centroid and average
the results (D)
3. Plot the graph D = f(n).
4. Optimal cluster number N is located
immediately after the last steep
decline, between DN-1 and DN.

yellowbrick.cluster.elbow.KElbowVisualizer
Clustering Metrics
Silhoutte value measures the consistency within clusters of data.

It shows how similar an object is to its own cluster (cohesion) compared to other
clusters (separation).

The silhouette ranges from −1 to +1, where a high

value indicates that the object is well matched to its
own cluster and poorly matched to neighboring
clusters.

Clustering metrics on synthetic da

Sklearn Silhouette analysis ta
sklearn.metrics.silhouette_score
Anomaly Detection

Anomaly (also called outlier or noise) -

unexpected value or event, that significantly
differs from normal behaviour.

Their appearance may indicate objective

internal changes or external interventions.

Examples of anomaly detection here.

Anomaly Detection

• Anomalies occur very rarely (anomaly

detection - problem
with imbalanced classes);
• Properties of anomalies significantly
differs from normal instances.
Gaussians Mixture Models and
Expectation Maximization
Each separate cluster can be
represented as a Gaussian distribution,
so data is a mixture of Gaussian
distributions.
GMM tries to group points, which
come from the same distribution.

If data is a realization of the mixture of three Gaussians with means (μ1, μ2,
μ3) and standard deviations (σ1, σ2, σ3), then GMM will identify for each
point a probability distribution among different clusters.
One-class SVM
Modification of the traditional SVM, that
transforms feature space, so that
observations lay as far as possible from the
origin. Two common approaches are
Schölkopf et al. and Tax & Duin.

As a result, on one side of the curve, there

are “normal” data points, and on the other
side - abnormal values or anomalies.

The algorithm can be used for novelty detection.

Training dataset shouldn’t contain any anomalies.

Kernel trick can be used for non-linear transformations of a feature-space.
Isolation Forest (iForest)
Based on the assumption, that there is a small amount of
anomalies and they’re significantly different from normal
observations.
That’s why iForest deliberately isolate abnormal points.

The key idea is, that it’s faster (smaller depth of the tree) to
separate abnormal points.

Implementation of iForest:
❑ Training set is used to build isolation forests.
❑ Each point of the test set is run through iForest and receives an
anomaly score.

Has linear complexity and can be easily applicable to large

datasets. Usually, performs better than one-class SVM
approach.
What is dimensionality?

Dimensionality is the number of dimensions, features, or variables associated

with a sample of data.

In many cases, numerous dimensions correlate, e.g. two (or more) variables
explain the same phenomenon.

Furthermore, each feature adds bias to data and represent noise.

Thus, with many variables explaining the same phenomenon, we also increase
the bias in the data, which makes it more difficult for algorithm to find relations.

In the end, the algorithm might end up mimicking the noise, instead of extracting
the meaningful information.
Suppose we have a system that analyzes the ECG data from wearable device
and produces a signal, when there’s a signal of heart stroke.

In ECG signal, much of it is noise, and by doing this (reducing the

dimensions), we improve the algorithm accuracy and performance (time and
computational).
What is dimensionality?
Unsupervised transformation creates a new representation of
the data that are easier to understand (for human) and analyze
(for ML algorithm)

Dimensionality reduction takes high-dimensional representation

of the data and summarizes the data with only essential
characteristics, namely with fewer features.

Dimensionality reduction manages data sparsity while keeping

the useful information.

Common application – reduction in two dimensions for

visualization purposes.
Dimensionality reduction BRANCHES
Linear projection – projecting data from Manifold learning, nonlinear dimensionality
high- to low-dimensional space reduction

• Principal component analysis PCA • Isomap

• Singular value decomposition SVD • Multidimensional scaling MDS
• Linear discriminant analysis LDA • Locally linear embedding LLE
• Non-negative matrix factorization • t-distributed stochastic neighbour
NNMF embedding t-SNE
• Dictionary learning
• Random trees embedding
• Independent component analysis ICA
Dimensionality reduction APPLICATIONS

• Pre-processing and feature engineering – isolate the most important

components, and thus reduce the calculations and training time
• Noise reduction (filtering). The noise does not comprise a large portion of
variation within the data, so we can remove noise from the signal by removing
the smaller components in variation, and then restoring the data back to the
original dataspace.
• Generating plausible artificial datasets – the method divides the dataset
into the components of information, and thus gives us the opportunity to
investigate the effect of each component to generate a new dataset samples
Principal Components Analysis PCA

AIM: to find low-dimensional representation of the data, retaining

as much of variation, as possible.

It finds the features with high correlations and attempts combine

the highly correlated features and represent them with smaller
number of linearly uncorrelated features.
When to use PCA
1. We need to reduce the dimensionality, but do not know which
variables to remove.
2. It is critical to have only independent variables.
3. You are able to tolerate non-interpretability of variables.

Yes, Yes/No, Yes

PCA
Yes, Yes/No, No
PCA algorithm
1. Standardize the variables columnwise:
𝒏
𝟏
𝟐
𝜎 =
𝒏−𝟏
∑ ( 𝑿 𝒊 − 𝑿 )( 𝑿 𝒊 − 𝑿 )
Having done this, we obtain 𝒊=𝟏
matrix Z of standardized features.

2. Get covariance matrix: 𝒏

𝟏
𝜎 (𝒙 , 𝒚 )= ∑ ( 𝑿 − 𝑿)(𝒀 𝒊 −𝒀 )
𝒏− 𝟏 𝒊=𝟏 𝒊
S ==

Detailed explanations on
covariance matrix are here
PCA algorithm
We can obtain covariance matrix:

3. Calculate eigenvector and eigenvalues

eigenvalue
S v = λv

eigenvector

Covariance Matrix X Eigenvector = Eigenvalue X Eigenvector

(n x n) (n x k) = (n x k)
k is the number of features we want to keep
PCA algorithm

4. Reorient data
′
𝑍 =𝑍 𝑉
(J x k) = (J x n) (n x k)
J rows and k principal features

Multiply by 0th value of eigenvector, produced by np package.

Thus, we obtain the feature matrix that keep maximum variance at

minimal number of features. This matrix is then used as feature matrix
for ML algorithms, or for visualizations.
PCA algorithm
1. Covariance matrix contains the relationships between the variables in
our feature matrix.

2. Eigenvectors represent directions, eigenvalues represent importance:

the larger values correspond to more important directions.

3. We assume that large variability on the certain direction is associated

with the behavior of the goal variable. In other word, small variability usually
is associated with noise, while the large variability is typically a signal.
PCA in practice
PCA implementation for face recognition: EIGENFACES and simpler version,
already in Collab here, dataset, explained here.

Scikit learn PCA

Other types of PCA with explanations and examples

Water treatment PCA manual, data are here, dataset description

Embeddings Projector
Independent component analysis
In the real world data, the case when many independent signals
are embedded into the features.
This group of problems can be referred as ‘cocktail party problem’.

We need to separate the signals from individuals with only data from
these microphones available.
Problem of overlapping (blending) signals separation
ICA algorithm
n – number of independent data sources.
The records from microphones will look like:
mixing matrix

dataset sources
X = As

A is a matrix of linear combination of signals (with size n x n) , or mixing matrix, i.e.

the signals from microphones (n people speaking and n microphones)
X is a random vector (with size n) representing multivariate input measurements,
namely
is the acoustic reading recorded by the mic j at time i.
s is a latent source vectors (with size n), components of which are independently
distributed random variables,
is the sound of speaker j at time i.
Given X, the AIM of independent component analysis is:

- estimate mixing matrix A

- estimate the distribution of sources , j = 1, …, n

X = As

s = A-1X
reassigned for convenience, unmixing matrix
W = A-1

𝑖 𝑖
s = WX 𝑠 =𝑊𝑥

[ ]
− 𝑤𝑇
1 −
𝑇
If we denote i-th row as : 𝑊 = − 𝑤2 − 𝑖 𝑇
𝑠 𝑗 =𝑤 𝑗 𝑥
𝑖
recovered source j
…
− 𝑤𝑇
𝑛 −
ICA ambiguity
1. If we have n sound sources, their order might be different, i.e. we are
not able to distinguish Z from Z*P, where P is permutation matrix

Thus, we are not able to distinguish W from PW. However, this is not critical
for the majority of applications.

2. We are not able to recover the correct scale .

If a mixing matrix is 2A, and the signal is 0.5s, the signal X is the same.
Again, this is not critical, as refers to volume scaling.
Stumble on Gaussian
We are able to recover s sources from X data only if s are non-Gaussian.
The density of Gaussian distribution is rotationally symmetric.

For data with n=2, s ~ , where I is 2 x 2 identity matrix.

If X is Gaussian, then s ~

So, if we have arbitrary rotational matrix so that

Thus, if mixing matrix has been replaced by

The distribution of is also Gaussian, so ~

Irrespective of mixing matrix, A or we obtain data from the same distribution .

The arbitrary rotational component cannot be determined from the data, because the data is
rotationally symmetric.
Density and
Linear Transformation
s is a random variable with density
X = As

What is density
Remind, that W = A-1

It seems logical to get s = WX, and then get using this expression.

s ~ Uniform[0,1], then

Let A=2, the X = 2s. X is distributed uniformly in interval [0, 2], or

W = 0.5. Thus, we calculate density as:
ICA algorithm
The distribution of each source is represented by density:

This representation allows keeping the assumption of sources independence

𝑛
𝑝 ( 𝑠 )=∏ 𝑝 𝑠 ( 𝑤 𝑇𝑗 𝑥 )|𝑊 |
𝑗=1
𝑧0

The density can be obtained as: 𝐹 ( 𝑧 0 ) =𝑃 ( 𝑧 < 𝑧 0 ) = ∫ 𝑝 𝑧 ( 𝑧 ) 𝑑𝑧

−∞
ICA algorithm
Thus, to specify density, we need to find its distribution.
Gaussian is not an option.

In case no prior knowledge about the distribution, we take sigmoid:

1
𝑔 ( 𝑠 )= −𝑠 𝑝 𝑠 ( 𝑠 ) =𝑔 ′ ( 𝑠 )
1+𝑒

If there is a reliable information about the distribution shape, the you may use this
distribution instead of sigmoid.

Note that the data need to be pre-processed to have zero mean.

ICA algorithm
) max ℓ ( 𝑊 )

Thus, we can move to gradient ascent:

𝑊 =¿𝑊 +𝛼 ¿
As the algorithm converges, we recover individual signal as

Scikit learn ICA

Bonus: UMAP vs t-SNE
t-distributed Stochastic Neighbour Projection
Embedding

Algorithms:
Uniform Manifold Approximation and
• Compute high dimensional probabilities p.
• Compute low dimensional probabilities q.
• Calculate the difference between the probabilities by a given cost function C(p,q).
• Minimize the cost function.

- SNE
- tSNE
Bonus: UMAP vs t-SNE

https://pair-code.github.io/understanding-umap/
References
Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Transactions on pattern analysis and machine
intelligence, 17(8), 790-799.

Dangeti, P. (2017). Statistics for machine learning: Build supervised, unsupervised, and reinforcement learning
models using both Python and R. Birmingham, UK : Packt Publishing.
Ertel, W., Black, N., & Mast, F. (2017). Introduction to artificial intelligence. Cham, Switzerland : Springer.

Igual, L., & Seguí, S. (2017). Introduction to Data Science: A Python approach to concepts, techniques and
applications. Springer International Publishing : Imprint : Springer. *

Johnston, B., Jones, A., Kruger, C., & Safari, an O'Reilly Media Company. (2019). Applied unsupervised
learning with Python. Packt Publishing, 2019.

Liu, F. T., Ting, K. M., Zhou, Z-H. (2009). Isolation forest. 2008 Eighth IEEE International Conference on Data
Mining.
Liu, F. T., Ting, K. M., Zhou, Z-H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge
Discovery from Data, 6(1)
References
Patel, A. A. (2019). Hands-On Unsupervised learning using Python: How to build applied machine learning

solutions from unlabeled data. O'Reilly Media. *

Pradhan M., Kumar U. (2019). Machine Learning using Python. Wiley India. *

Swamynathan, M. (2019). Mastering Machine Learning with Python in Six Steps: A Practical Implementation
Guide to Predictive Data Analytics Using Python. Berkeley, CA: Apress L.P.
References
Ayramo, S., Karkkainen, T. (2006). Introduction to partitioning-based clustering methods with a robust example.
Reports of the Department of Mathematical Information Technology Series C. Software and Computational
Engineering, 1, 1-34.

Comaniciu, D., Meer, P. (2002). Mean Shift: A robust approach toward feature space analysis. IEEE Transactions
on pattern analysis and machine intelligence, 24(5), 603-619.

Estivill-Castro, V. (2002). Why so many clustering algorithms — A Position Paper. SIGKDD Explorations, 4 (1),
65-75.
Ghassabeh, Y., A. (2015).
A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel, Journal of
Multivariate Analysis, 135, 1-10.
Zimek, A., & Filzmoser, P. (2018). There and back again: Outlier detection between statistical reasoning and
data mining algorithms. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(6).

NATO Group Clasification
No ratings yet
NATO Group Clasification
12 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
Unit-4
No ratings yet
Unit-4
19 pages
Unit 5
No ratings yet
Unit 5
10 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit 2
No ratings yet
Unit 2
33 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
1. Clustering
No ratings yet
1. Clustering
75 pages
ML - 8
No ratings yet
ML - 8
70 pages
Week-10
No ratings yet
Week-10
84 pages
Clustering Analysis (1)
No ratings yet
Clustering Analysis (1)
12 pages
Clustering
No ratings yet
Clustering
65 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Chapter 3-Unsupervised learning_updated
No ratings yet
Chapter 3-Unsupervised learning_updated
54 pages
Cluster
100% (1)
Cluster
72 pages
Clustering
No ratings yet
Clustering
12 pages
M5
No ratings yet
M5
40 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Clustering
No ratings yet
Clustering
45 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
M5
No ratings yet
M5
40 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Clustering
No ratings yet
Clustering
11 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Unit 5
No ratings yet
Unit 5
63 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
Chapter 4 _ Clustering
No ratings yet
Chapter 4 _ Clustering
21 pages
UNIT V MACHINE LEARNING
No ratings yet
UNIT V MACHINE LEARNING
5 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
UNIT IV DM
No ratings yet
UNIT IV DM
15 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
UNIT5
No ratings yet
UNIT5
60 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Clustering
No ratings yet
Clustering
69 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
_Clustering
No ratings yet
_Clustering
41 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Clustering
No ratings yet
Clustering
7 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
ML unit-4
No ratings yet
ML unit-4
23 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
HTML5 Applications Development Fundamentals 98-375
100% (8)
HTML5 Applications Development Fundamentals 98-375
297 pages
Clinical Pathway Saraf Spondilitis TB
No ratings yet
Clinical Pathway Saraf Spondilitis TB
4 pages
Product Technical Sheet
No ratings yet
Product Technical Sheet
2 pages
Ict Eplan5 1
No ratings yet
Ict Eplan5 1
2 pages
Subnetting 1
No ratings yet
Subnetting 1
10 pages
Cryptopunks
No ratings yet
Cryptopunks
4 pages
Syllabus: Cambridge International AS & A Level Digital Media & Design 9481
No ratings yet
Syllabus: Cambridge International AS & A Level Digital Media & Design 9481
38 pages
FFIEC CAT May 2017 All Documents Combined
100% (1)
FFIEC CAT May 2017 All Documents Combined
137 pages
Nav Tex Reciever
No ratings yet
Nav Tex Reciever
10 pages
Constructing The Meaning of Mundane Urban Places Through The Mapping of Geo-Tagged Social Media Content
No ratings yet
Constructing The Meaning of Mundane Urban Places Through The Mapping of Geo-Tagged Social Media Content
7 pages
T Rec G.7041 200112 S!!PDF e
No ratings yet
T Rec G.7041 200112 S!!PDF e
52 pages
20210623-25 Telegartner Training Slides
No ratings yet
20210623-25 Telegartner Training Slides
161 pages
Projekt Kursi:: Sistemet e Avancuara Të Telekomunikacioneve
No ratings yet
Projekt Kursi:: Sistemet e Avancuara Të Telekomunikacioneve
20 pages
711 Nsurlsession New Features and Best Practices
No ratings yet
711 Nsurlsession New Features and Best Practices
186 pages
New Product Rench EN ET: P-Channel Enhancement-Mode MOSFET
No ratings yet
New Product Rench EN ET: P-Channel Enhancement-Mode MOSFET
4 pages
The Joint Tip Graft: A Joint Support For Rim, Facet and Infratip Lobule in Rhinoplasty
No ratings yet
The Joint Tip Graft: A Joint Support For Rim, Facet and Infratip Lobule in Rhinoplasty
7 pages
Spot Welding
No ratings yet
Spot Welding
15 pages
Tensegrity Structures Sixty Years of Art, Science, and Engineering
No ratings yet
Tensegrity Structures Sixty Years of Art, Science, and Engineering
54 pages
New Form Order List PDG Foom Foomee
No ratings yet
New Form Order List PDG Foom Foomee
10 pages
Altivar 11
No ratings yet
Altivar 11
8 pages
TCS Ninja Aptitude Questions - Set 1 (Standard Section)
No ratings yet
TCS Ninja Aptitude Questions - Set 1 (Standard Section)
8 pages
Writing Booklet S6
No ratings yet
Writing Booklet S6
77 pages
Deployment Linux
No ratings yet
Deployment Linux
48 pages
Build An AI To Play Dino Run
No ratings yet
Build An AI To Play Dino Run
9 pages
EPRI Ultrasound Assisted Lubrication
100% (1)
EPRI Ultrasound Assisted Lubrication
39 pages
Morningstar Tristar Ts45mppt Ts60mppt Manual en
No ratings yet
Morningstar Tristar Ts45mppt Ts60mppt Manual en
68 pages
Oferta
No ratings yet
Oferta
42 pages
TE 303TP Quick Reference Guide Usb
No ratings yet
TE 303TP Quick Reference Guide Usb
4 pages
Toposhaper - User Manual - English - V1.0 - 22 Apr 13
No ratings yet
Toposhaper - User Manual - English - V1.0 - 22 Apr 13
25 pages

Uploaded by

Uploaded by

Generative

Unsupervised learning Supervised learning

No labels Labels are given

Historical data Data cleaning and Exploitation of Real data

What’s the natural grouping between the data points?

1 Each point is considered as a separate

❑ predefined cluster number is achieved

Multi-dimensional visualizations for clustering

1. Define the kernel size and move a window on a current point.

More information here.

• Window size can

Do not forget that clusters are, in large part, on the

Clusters number is chosen:

The silhouette ranges from −1 to +1, where a high

Clustering metrics on synthetic da

Anomaly (also called outlier or noise) -

Their appearance may indicate objective

Examples of anomaly detection here.

• Anomalies occur very rarely (anomaly

As a result, on one side of the curve, there

The algorithm can be used for novelty detection.

Training dataset shouldn’t contain any anomalies.

Has linear complexity and can be easily applicable to large

Dimensionality is the number of dimensions, features, or variables associated

Furthermore, each feature adds bias to data and represent noise.

In ECG signal, much of it is noise, and by doing this (reducing the

Dimensionality reduction takes high-dimensional representation

Dimensionality reduction manages data sparsity while keeping

Common application – reduction in two dimensions for

• Principal component analysis PCA • Isomap

• Pre-processing and feature engineering – isolate the most important

AIM: to find low-dimensional representation of the data, retaining

It finds the features with high correlations and attempts combine

Yes, Yes/No, Yes

2. Get covariance matrix: 𝒏

3. Calculate eigenvector and eigenvalues

Covariance Matrix X Eigenvector = Eigenvalue X Eigenvector

Multiply by 0th value of eigenvector, produced by np package.

Thus, we obtain the feature matrix that keep maximum variance at

2. Eigenvectors represent directions, eigenvalues represent importance:

3. We assume that large variability on the certain direction is associated

Scikit learn PCA

Other types of PCA with explanations and examples

Water treatment PCA manual, data are here, dataset description

A is a matrix of linear combination of signals (with size n x n) , or mixing matrix, i.e.

- estimate mixing matrix A

2. We are not able to recover the correct scale .

For data with n=2, s ~ , where I is 2 x 2 identity matrix.

So, if we have arbitrary rotational matrix so that

Thus, if mixing matrix has been replaced by

The distribution of is also Gaussian, so ~

Irrespective of mixing matrix, A or we obtain data from the same distribution .

Let A=2, the X = 2s. X is distributed uniformly in interval [0, 2], or

This representation allows keeping the assumption of sources independence

The density can be obtained as: 𝐹 ( 𝑧 0 ) =𝑃 ( 𝑧 < 𝑧 0 ) = ∫ 𝑝 𝑧 ( 𝑧 ) 𝑑𝑧

In case no prior knowledge about the distribution, we take sigmoid:

Note that the data need to be pre-processed to have zero mean.

Thus, we can move to gradient ascent:

Scikit learn ICA

solutions from unlabeled data. O'Reilly Media. *

You might also like