ML Unit 2 Notes
ML Unit 2 Notes
Example
Imagine you have a machine learning model trained on a large dataset of unlabeled images, containing
both dogs and cats. The model has never seen an image of a dog or cat before, and it has no pre-existing
categories for these animals. Now we have to use unsupervised learning to identify the dogs and cats
image.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as dogs and
cats. But it can categorize them according to their similarities, patterns, and differences, i.e., we can
easily categorize the images into two parts. The first may contain all pictures having dogs in them and
the second part may contain all pictures having cats in them. Here the machine didn’t learn anything
before. It allows the model to work on its own to discover patterns and information that was previously
undetected. It mainly deals with unlabelled data.
The Machine Learning algorithm's operation is shown in the following block diagram(fig 1.1):
1. Clustering.
2. Association.
1. Clustering:
Clustering is a method of grouping the objects into clusters such that objects with most similarities
remains into a group and has less or no similarities with the objects of another group. Clustering
algorithms work by iteratively moving data points closer to their cluster centers and further
away from data points in other clusters. Cluster analysis finds the commonalities between the
data objects and categorizes them as per the presence and absence of those commonalities. A
block diagram of clustering is shown below(Figure 1.2):
2. Association:
An association rule is an unsupervised learning method which is used for finding the relationships
between variables in the large database. It determines the set of items that occurs together in the
dataset. Association rule makes marketing strategy more effective. This rule shows how
frequently a itemset occurs in a transaction. A typical example is a Market Based Analysis.
Market Based Analysis is one of the key techniques used by large relations to show associations
between items. It allows retailers to identify relationships between the items that people buy
together frequently. Given a set of transactions, we can find rules that will predict the
occurrence of an item based on the occurrences of other items in the transaction.
K-means Clustering:
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created
in the process. K-Means performs the division of objects into clusters that share similarities and
are dissimilar to the objects belonging to another cluster.
The K-means clustering is simple and most commonly used clustering algorithm. The K-means
clustering algorithm starts with a some randomly selected data point called centroids. These
centroids points are used as the beginning points of every cluster. Then repeated calculations are
used to optimize the positions of the centroids.
The K-means clustering stops optimizing clusters in two cases:
i. If centroids have stabilized.
ii. If defined number of iterations achieved.
1. Simple and easy to implement: The k-means algorithm is easy to understand and
implement, making it a popular choice for clustering tasks.
2. Fast and efficient: K-means is computationally efficient and can handle large
datasets with high dimensionality.
3. Scalability: K-means can handle large datasets with a large number of data points
and can be easily scaled to handle even larger datasets.
Kernel K-means:
Kernel k-means have been used to identify clusters that are non-linearly separable in input space. Here
we replace Euclidean distance used in k-means by the kernel functions.
Dimensionality reduction refers to the techniques for reducing the number of input variables in the
training data. By reducing input variable we can build sample machine learning models. When dealing
with high dimensional data, it is useful to reduce the dimensions of data. By projecting to a lower
dimensional space we can capture the essence to data. This is called dimensionality reduction.
Principal Component Analysis (PCA) is an unsupervised learning algorithm which is used for dimensionality
reduction in machine learning. PCA is a very popular tool which is used for exploratory data analysis and
predictive modeling. PCA finds most significant features in a large data set and makes the data easy for
plotting in 2D and 3D.
Fig. 1.4 Principal components PC1 and PC2 orthogonal to each other.
In Fig. 1.4., there are several points plotted on a 2D plane. There are two principal components
namely PC-1 and PC-2. The PC1 component shows maximum variance in the data. PC-2 is
orthogonal to PC-1.
Step2: Represent data into a structure: Represent the two dimensional matrix of independent
variable X. Here each row shows data item, and each column shows features. The number of
columns is called the dimension of dataset.
Step3: Standardise the data(Normalization): The features with high variance are more important
as compared to the features with lower variance.
Step4: Calculate the covariance of matrix: Take the matrix, namely Z and transpose it. After
transpose , multiply the matrix(Z) with its transpose. The resulting matrix will be covariance of
matrix(Z).
Step5: Calculate the Eigen values and Eigen vectors: Now, calculate the Eigen values and Eigen
vectors of covariance of matrix(Z). The Eigen vectors of the covariance matrix are the directions
of axes with high information. The coefficients of these Eigen vectors are called eigen values.
Step6: Sort the Eigen vectors: Sort the eigen vectors and arrange in decreasing order.
Simultaneously, sort the eigen vectors in matrix P of eigen values. The resultant matrix is denoted
by P*.
Step8: Remove unimportant features from new data set: In this new data set, keep only important
features and remove less important features.
3. Loss of information:
PCA accounts for the greatest amount of variation across data characteristics. If the number of Principal
Components is not carefully chosen, it may miss certain information in contrast to the real list of
characteristics. Although dimensionality reduction is beneficial, it has a cost. Loss of information is an
inevitable component of principal component analysis. Managing the trade-off between dimensionality
reduction and information loss is, regrettably, an unavoidable tradeoff when employing PCA.
▪ Flexibility – by capturing nonlinear patterns, it’s more flexible and adaptable to various
data types. Thus, kernel PCA is used for many domains, including image recognition and
speech processing.
Disadvantages of KPCA:
▪ Choosing an appropriate kernel function and its parameters can be challenging and may
require expert knowledge or extensive experimentation.
▪ Kernel PCA can be computationally expensive, especially for large datasets, as it requires
the computation of the kernel matrix for all pairs of data points.
▪ It may not always be easy to interpret the results of kernel PCA, as the transformed data
may not have a clear interpretation in the original feature space.
▪ Kernel PCA is not suitable for datasets with many missing values or outliers, as it assumes
a complete and consistent dataset.
Recommendation System:
Matrix factorization :
Matrix factorization is a way to generate latent features when multiplying two different kinds of entities.
Collaborative filtering is the application of matrix factorization to identify the relationship between items’
and users’ entities. With the input of users’ ratings on the shop items, we would like to predict how the
users would rate the items so the users can get the recommendation based on the prediction.
Assume we have a table of 5 users and 5 movies, and the ratings are integers ranking from 1 to 5, the matrix
is provided by the table below.
Matrix Completion:
Matrix Completion is a method for recovering lost information in a data matrix, such as ratings,
preferences, or measurements. . It originates from machine learning and usually deals with highly sparse
matrices. Missing or unknown data is estimated using the low-rank matrix of the known data. It has
applications in recommender systems, computer vision, and natural language processing. However, in
many cases, the data matrix is not completely random, but has some underlying structure or pattern that
reflects prior knowledge or constraints. For example, the ratings may be influenced by user profiles, the
images may have low rank or sparsity, or the words may belong to certain topics.
algorithms work by iteratively moving data points closer to their cluster centers and further
away from data points in other clusters. Cluster analysis finds the commonalities between the
data objects and categorizes them as per the presence and absence of those commonalities.
Example: Let's understand the clustering technique with the real-world example of Mall: When
we visit any shopping mall, we can observe that the things with similar usage are grouped
together. Such as the t-shirts are grouped in one section, similarly, at fruits sections, apples,
bananas, Mangoes, etc., are grouped in separate sections, so that means same types of things are
kept in one group or place. The clustering technique also works in the same way.
Dimensionality reduction refers to the techniques for reducing the number of input variables in the
training data. By reducing input variable we can build sample machine learning models. It is a way of
converting the higher dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information. When dealing with high dimensional data, it is useful to reduce the dimensions of data. By
projecting to a lower dimensional space we can capture the essence to data. This is called dimensionality
reduction. It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data visualization, noise
reduction, cluster analysis, etc.
5.What is PCA?
Principal Component Analysis (PCA) is an unsupervised learning algorithm which is used for dimensionality
reduction in machine learning. PCA is a very popular tool which is used for exploratory data analysis and
predictive modeling. PCA finds most significant features in a large data set and makes the data easy for
plotting in 2D and 3D.
In Fig. 1, there are several points plotted on a 2D plane. There are two principal components namely PC-
1 and PC-2. The PC1 component shows maximum variance in the data. PC-2 is orthogonal to PC-1.
Step2: Represent data into a structure: Represent the two dimensional matrix of independent
variable X. Here each row shows data item, and each column shows features. The number of
columns is called the dimension of dataset.
Step3: Standardise the data(Normalization): The features with high variance are more important
as compared to the features with lower variance.
Step4: Calculate the covariance of matrix: Take the matrix, namely Z and transpose it. After
transpose , multiply the matrix(Z) with its transpose. The resulting matrix will be covariance of
matrix(Z).
Step5: Calculate the Eigen values and Eigen vectors: Now, calculate the Eigen values and Eigen
vectors of covariance of matrix(Z). The Eigen vectors of the covariance matrix are the directions
of axes with high information. The coefficients of these Eigen vectors are called eigen values.
Step6: Sort the Eigen vectors: Sort the eigen vectors and arrange in decreasing order.
Simultaneously, sort the eigen vectors in matrix P of eigen values. The resultant matrix is denoted
by P*.
Step8: Remove unimportant features from new data set: In this new data set, keep only important
features and remove less important features.
Questions:
1. What is Unsupervised Machine learning? Discuss different types of unsupervised machine
learning.
2. Write down advantage and disadvantage of K-means clustering.
3. Write down advantage and disadvantage of PCA.
4. What is matrix factorization and matrix completion.
5. Explain Kernel PCA.