MLT Unit 3 Notes

Open navigation menu

Upload

0% found this document useful (0 votes)

254 views19 pages

MLT Unit 3 Notes

Uploaded by

Taarani Sreeram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

254 views19 pages

MLT Unit 3 Notes

Uploaded by

Taarani Sreeram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 19

UNIT 01 DISTANCE-BASED MODELS. 742 NEAREST NEIGHBOR MODELS Nearest neighbour methods are a good starting point since they readily encode basic smoothness intuitions and are easy to program, forming a useful baseline method, Ina classification problem each input veetor x has a corresponding class label, c* € {1,...,C}. Given a dataset of N such training examples, D = {xn , en} ,» = 1,...,.N, and a novel x, we aim to return the correct class e(x). A simple, but often effective, strategy for this supervised learning problem can be stated as: for novel x, find the nearest input in the training set and use the class of this nearest input For veetors x and x’ representing two different datapoints, we measure ‘nearness’ using a dissimilarity finetion d(x, x’) A common dissimilarity is the squared Euclidean distance d(x, x’) = (x —¥’)F(x —¥’) which can be more conveniently written (x — x)’. Based on the squared Euclidean distance, the decision boundary is determined by the lines which are the perpendicular bisectors of the closest training points with different training labels given in the figure below. This is called a Voronoi tessellation. In nearest neighbour classification a new vector is assigned the label of the nearest vector in the training set, Here there are three classes, with training points given by the circles, along with their class. The dots indicate the class of the nearest training vector. The decision boundary is piecewise linear with each segment corresponding to the perpendicular bisector between two datapoints belonging to different classes, giving rise to a Voronoi tessellation of the input space The nearest neighbour algorithm is simple and intuitive. There are some issues: ‘4 How should we measure the distance between points? Whilst the Euclidean square distance is popular, this may not always be appropriate. A fundamental limitation of the Euclidean distance is that it does not take into account how the data is distributed. A The whole dataset needs to be stored to make a classification. This can be addressed by a method called data editing in which datapoints which have little or no effect on the decision boundary are removed from the training dataset, The Nearest neighbour algorithm to classify a new vector x, given a set of training data D = {(xn,en), n= 1,...,.N} 41: Calculate the dissimilarity of the test point x to each of the stored points, dn = d(x, xn).n=1, oN. 2: Find the training point x" which is nearest to x n+ = argmin d (x, x*) 3: Assign the class label e(x) = 0. 4: In the case that there are two or more ‘equidistant’ neighbours with different class labels, the most numerous class is chosen. If there is no one single most numerous class, we use the K-nearest- neighbours Machine Learning Techniques Page [42i ee oe oe ‘The figures illusatrate the decision regions of a S-nearest neighbour classifier; the shading represents the predicted probability distribution over the five classes. (middle) S-nearest neighbour. (right) 7= nearest neighbour K-MEANS ‘The K-means problem is NP-complete, which means that there is no efficient solution to find the global minimum, So we need to initiate the heuristic algorithm An algorithm that tries to minimize the distance of the points in a chister with their centroid is the k= means clustering algorithm K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. In K-Means, each cluster is associated with a centroid. ‘The main objective of the K-Means algorithm is to minimize the sum of distances between the points and their respective cluster centroid. The outline of the algorithm is given below. The algorithm iterates between partitioning the data using the nearest-centroid decision rule, and recalculating centroids from a partition Algorithm 8.1: KMeans(D.K) ~ Kemeans clustering using Euclidean distance Dis2. Input : data D C Rd ; number of dusters K EN. Output: K cluster means jt, pK € Rd 1 randomly initialise K vectors qt1,... AK € Rd; 2 repeat 3 assign each x ED to argminj Dis2(s.j ) 4 forj=1t0Kdo 5 Dj {x €D|x assigned to cluster} }: 6 Wj = 1/|Dj | Eaens X: 7 end 8 until no change in 1, .. WK: S return Ll, RK: The figure given below gives the demonstration of the algorithmon a small data set with three clusters First iteration of 3-means on Gaussian mixture data. The dotted lines are the Voronoi boundaries resulting from randomly initialised centroids: the violet solid lines are the result of the recalculated means. (middle) Second iteration, taking the previous partition as starting point (dotted line). (right) Third iteration with stable clustering, Machine Learning Techniques Page [43‘The next figure shows how an unfortunate initialisation of the centroids can lead to a sub-optimal solution. In practice itis advisable to run the algorithm a number of times and select the solution with the smallest within-cluster scatter: In the figure given above (left) First iteration of 3-means on the same data as previous figure with differently initialised centroids. (right) S-means has converged to a sub-optimal clustering. CLUSTERING AROUND MEDOIDS K-means algorithm is used for different distance metric but this will also change the objective function being minimised. So K-medoids algorithm, which additionally requires the exemplars to be data joints. The K medoids algorithm is given below Algorithm 8.2: KMedoids(D.K.Dis) — K-medoids clustering using arbitrary distance metric Dis. Input : data D CX; number of elusters Kt EN, distancemetrie Dis ‘X xX =R. Output : K medoids {1, .. . AK ED, representing a predictive clustering off. 1 randomly pick K data points 4, ... HK €D; 2 repeat 3 assign each x ED to argminj Diss. }: + forj=1tokdo 5 Dj {x €D|x assigned to chusterj }; 6 {i= wml cm E som Diss 8 until no change in 411, 9 return ft, pI: nk For calculating the medoid of a cluster requires examining all pairs of points, whereas calculating the mean requires just a single pass through the points, which can be prohibitive for large data sets. an alternative algorithm called partitioning around medoids (PAM) that tries to improve a clustering locally by swapping medoids with other data points. The quality of a clustering Q is calculated as the total distance over all points to their nearest medoid Machine Learning Techniques Page 144Algorithm 8.5: PAM(D.K.Dis) — Partitioning around medoids clustering using arbitrary distance metric Dis, Input : data D CX; number of clusters K €N; distancemetrie Dis :X xX —R. ‘Output: K medoids 11, ... JK €D, representing a predictive clustering ofX. 1 randomly pick K data points jt... IK ED; 2 repeat 3 assign each x ED to argminj Dis(x.pj) + for j=1tokdo 5 Dj “fx ED |x assigned to duster} }; 6 end 7 Qj ExeDj Diss.) 8 for each medoidmand each non-medeid o do 9 caleulate the improvement in Q resulting from swappingmwith o; 10 end n select the pair with maximum improvement and swap: 12 until no further improvement possible; 15 return jut, gs The two data sets in Figure shows that they are identical, except for a rescaling of the y-axis. K-means finds entirely different clusterings. The figure in (left) On this data 2-means detects the right clusters. (right) After rescaling the y-axis, this configuration has a higher between-cluster scatter than the intended one, SILHOUTTES An interesting technique is the use of silhouettes. For any data point xi , let d{xi .Dj ) denote the average distance of xi to the data points in cluster Dj , and let j (i) denote the index of the cluster that xi belongs to. Furthermore, let a(xi ) = d(xi .Dj (1) be the average distance of xi to the points in its own cluster Dj (i), and let b(xi ) = min k #j (7) €(xi Dk) be the average distance to the points in its, neighbouring cluster. We would expect a(xi ) to be considerably smaller than b(xi }, but this cannot be guaranteed Its, however, conceivable that a(xi ) > b(xi ), in which case the difference b(xi }-a(xi ) is negative This describes the situation that, on average, the members of the neighbouring cluster are closer to xi than the members of its own cluster. In order to get a normalised value we divide by a(xi ) in this case. This leads to the following definition: dix) atx) $00 = Faxtat), BORD) ‘A silhouette then sorts and plots s(x) for each instance, grouped by cluster. Examples are shown in Figure Machine Learning Techniques Page [45we have used squared Euclidean distance in the construction of the silhouette, but the method can be applied to other distance metrics. We can clearly see that the first clustering is much better than the second, In addition to the graphical representation, we can compute average silhouette values per cluster and over the whole data set. HIERARCHICAL CLUSTERING ‘A method that uses tree structures to represent a cluster is called as hierarchical clustering. those trees use features to navigate the instance space, similar to decision trees. consider trees called dendrograms, which are purely defined in terms of a distance measure. Because dendrograms ase features only indirectly, as the basis on which the distance measure is ealeulated, they partition the given data rather than the entire instance space, and hence represent a descriptive clustering rather than a predictive one. Given a data set D, a dendrogram is a binary tree with the elements of D at its leaves. An internal node of the tree represents the subset of elements in the leaves of the subtree rooted at that node. The level of anode is the distance between the two clusters represented by the children of the node. Leaves have level 0. A linkage function L: 2X x2X — R calculates the distance between arbitrary subsets of the instance space, given a distance metric Dis :X xX —R. The most common linkage functions are as follows: Single linkage defines the distance between two clusters as the smallest pairwise distance between elements from each cluster. 4 Complete linkage defines the distance between two clusters as the largest pointwise distance 4b Average linkage defines the cluster distance as the average pointwise distance 4 Centroid linkage defines the cluster distance as the point distance between the cluster means. ‘These linkage functions can be defined mathematically as follows: Leage(A.B) = xeagtSDIs(%y) eamplae(A,B)= max ,Distx,y) ExcaycuDis(xy) Faveaee( A B= TAB] Lrcax Lyesy Leontona( A,B) = Dis (24%, E12) Lentrost(A. B) a ) all these linkage functions coincide for singleton clusters: L({x}. {y}) = Dis(x, y)- However, for larger clusters they start to diverge. For example, suppose Dis(x. y) r In gnd case, there is higher chance for 2 documents to appear in same bucket at least once as they have more opportunities (20 vs 10) and fewer elements of the signature are getting compared (5 vs 10) Higher b implies lower similarity threshold (higher false positives) and lower b implies higher similarity threshold (higher false negatives) EX: Setup: 100k documents stored as signature of length 100 Signature matrix: 100100000 Brute force comparison of signatures will result in 100C2 comparisons = 5 billion (quite a lot!) Let's take b = 20 +r similarity threshold (t) : 80% ‘We want 2 documents (D1 & Da) with 50% similarity to be hashed in the same bucket for at least one of the 20 bands. P(D1 & De identical in a particular band) = (0.8)° = 0.528 P(D1 & D2 are not similar in all 20 bands) = (1-0.528)"20 = 0.00035 ‘This means in this scenario we have ~.035% chance of a false negative @ 80% similar documents. Also we want 2 documents (D3 & D4) with 30% similarity to be not hashed in the same bucket for any of the 20 bands (threshold = 80%). P(D3 & D+ identical in a particular band) = (0.3)° = 0.00243. P(Ds & Dé are similar in at least one of the 20 bands) — (1-0.00248)'20 = 0037+ Machine Learning Techniques Page [55This means in this scenario we have ~#.74%% chance of a false positive @ 30% similar documents. So we can see that we have some false positives and few false negatives. These proportion will vary with choice of b and r. What we want here is something like below. If we have 2 documents which have similarity greater than the threshold then probability of them sharing the same bucket in at least one of the bands should be 1 else. ; { / Prati rrobsatty | Noctane of sharing ifset abucket \ \ ‘ Similarity s of two sets ~ nai amet, ‘boc NON-PARAMETRIC REGRESSION Nonparametric regression, like linear regression, estimates mean outcomes for a given set of covariates. Unlike linear regression, nonparametric regression is agnostic about the functional form between the outcome and the covariates and is therefore not subject to misspecification error. In nonparametie regression, you do not specify the fimetional form. You specify variable—the ontcome—and the covariates. You specify yx1x2, and x3 to fit the dependent (x1xe.xs)+e ‘The method does not assume that g() is linear; it could just as well be 11+ B2n22+ Bsx51x2+Btes +E “The method does not even assume the function is linear in the parameters. It could just as well be 1 xB21+cos(x253)+e ‘Or it could be anything else, The result is not returned to you in algebraic form, but predicted values and derivatives can be calculated. To fit whatever the model is, you type - Mpregress kernel y x1 x2 x3 Machine Learning Techniques Page 156npregress needs more observations than linear regression to produce consistent estimates, of course, but perhaps not as many extra observations as you would expect. A model like this one could easily be fit on 500 observations. ‘The goal ofa regression analysis is to produce a reasonable analysis to the unknown response function £, where for N data points (Xi,¥i), the relationship can be modeled as, Yiem(xi)+ei,i=1,. =-Note: my.) = Efy|x] iF Efe |x}=0 ie, ele We have different ways to model the conditional expectation function (CEF), m()) Parametric approach= m(.) is known and smooth. It is fully described by a finite set of parameters, to be estimated. Easy interpretation. For example, a linear model y= x,'B+e,, i=heN + Nonparametric approach- mi.) is smooth, flexible, but unknown. Let the data determine the shape of m(). Difficult interpretation y, = m(x,)+E,, N = Semiparametric approach m(,) have some parameters -to be estimated-, but some parts are determined by the data. B+ m.(2))+6,, “N ENSEMBLE LEARNING Ensemble Learning Techniques in Machine Learning, Machine learning models suffer bias and/or variance. Bias is the difference between the predicted value and actual value by the model. Bias is introduced when the model doesn’t consider the variation of data and creates a simple model. The simple model doesn't follow the patterns of data, and hence the model gives errors in predicting training as well as testing data ie. the model with high bias and high variance, You Can also read Covariance and Correlation In Machine Learning, ‘When the model follows even random quirks of data, as pattern of data, then the model might do very well on training dataset ie. it gives low bias, but it fails om test data and gives high variance Therefore, to improve the accuracy (estimate) of the model, ensemble learning methods are developed. Ensemble is a machine learning concept, in which several models are trained using machine learning algorithms. It combines low performing classifiers (also called as weak learners or base learner) and. combine individual model prediction for the final prediction, On the basis of type of base learners, ensemble methods can be categorized as homogeneous and heterogeneous ensemble methods. If base learners are same, then it is a homogeneous ensemble method. [fbase learners are different then it is a heterogencous ensemble method. arta} Method Pte cult Pete caer Different Base Learners Ensemble techniques are classified into three types: 4b Bagging 4 Boosting + Stacking Machine Learning Techniques Page [57BAGGING AND RANDOM FORESTS Bagging, short for ‘bootstrap aggregating’, is a simple but highly effective ensemble method that creates diverse models on different random samples of the original data set. These samples are taken uniformly with replacement and are known as bootstrap samples, Because samples are taken with replacement the bootstrap sample will in general contain duplicates, and hence some of the original data points will be missing even if the bootstrap sample is of the same size as the original data set. This is exactly what we want, as differences between the bootstrap samples will create diversity among the models in the ensemble. In the figure given above An ensemble of five basic linear classifiers built from bootstrap samples with Bagging is given. The decision rule is majority vote, leading to a piecewise linear decision boundary. (right) If we turn the votes into probabilities, we see the ensemble is effectively a grouping mode: each instance space segment obtains a slightly different probability. A bagging algorithm, which returns the ensemble as a set of models is given. We can choose to combine the predictions from the different models by voting — the class predicted by the majority of models wins — or by averaging, which is more appropriate if the base classifiers output scores or probabilities. samples Input _: data set D; ensemble size T; learning algorithm of Output: ensemble of models whose predictions are to be combined by voting oraveraging 1 for t= 1t0T do build a bootstrap sample D; from D by sampling [D| data points with replacement 1 | runef on D; to produce a model Mi: tend + return (Mill. s¢'

Module I Supervised Learning PPT-1
100% (1)
Module I Supervised Learning PPT-1
147 pages
5. Clustering in non-euclidean space
No ratings yet
5. Clustering in non-euclidean space
4 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
DSA5102_lecture9
100% (1)
DSA5102_lecture9
35 pages
Distance Based Models
No ratings yet
Distance Based Models
19 pages
Big data aktu unit 3
No ratings yet
Big data aktu unit 3
90 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
BCS403 DBMS Module1 QuestionBank 2022-23
No ratings yet
BCS403 DBMS Module1 QuestionBank 2022-23
2 pages
Machine Learning-4
No ratings yet
Machine Learning-4
18 pages
M4-Similarity based learning
No ratings yet
M4-Similarity based learning
8 pages
ML-UNIT-5
No ratings yet
ML-UNIT-5
20 pages
DBMS Ninja Notes
No ratings yet
DBMS Ninja Notes
134 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Mining Social Network Graphs
No ratings yet
Mining Social Network Graphs
35 pages
Data Mining Models - GeeksforGeeks
No ratings yet
Data Mining Models - GeeksforGeeks
4 pages
ML Decode TE IT
No ratings yet
ML Decode TE IT
71 pages
R Language
No ratings yet
R Language
59 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
Data Warehousing and Data Mining Syllabus
No ratings yet
Data Warehousing and Data Mining Syllabus
2 pages
Unit 1 Aktu
No ratings yet
Unit 1 Aktu
26 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
DAV Quantum
No ratings yet
DAV Quantum
143 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
unit V
No ratings yet
unit V
67 pages
N Queen Problem
No ratings yet
N Queen Problem
12 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Relational Algebra in Database
No ratings yet
Relational Algebra in Database
69 pages
Data Mining Handout
No ratings yet
Data Mining Handout
4 pages
Unit 4
No ratings yet
Unit 4
4 pages
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
No ratings yet
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
54 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
Umldp Lab Manual
No ratings yet
Umldp Lab Manual
23 pages
UNIT2
No ratings yet
UNIT2
25 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Dbms Unit II
No ratings yet
Dbms Unit II
49 pages
KNN Solved Numerical Problem( Regression)
No ratings yet
KNN Solved Numerical Problem( Regression)
3 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
10 pages
textbook ML_removed (2)
No ratings yet
textbook ML_removed (2)
10 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Standard Template Library
No ratings yet
Standard Template Library
24 pages
Unit II: Software Requirement Analysis and Specifications
No ratings yet
Unit II: Software Requirement Analysis and Specifications
64 pages
Unit-3 Part 1 Normalization
No ratings yet
Unit-3 Part 1 Normalization
31 pages
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
No ratings yet
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
22 pages
Data Warehousing & Mining: Unit - V
100% (2)
Data Warehousing & Mining: Unit - V
13 pages
Assignment - C+++ - No - 01 To 16
No ratings yet
Assignment - C+++ - No - 01 To 16
16 pages
Sonali DBMS Notes
100% (13)
Sonali DBMS Notes
61 pages
ER Practical 7r
No ratings yet
ER Practical 7r
5 pages
DAA 2marks With Answers
No ratings yet
DAA 2marks With Answers
11 pages
Parallel Database Systems
No ratings yet
Parallel Database Systems
17 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
TE7265 - Introduction To Data Science
No ratings yet
TE7265 - Introduction To Data Science
4 pages
Lesson Plan: Data Warehousing and Data Mining
No ratings yet
Lesson Plan: Data Warehousing and Data Mining
1 page

Uploaded by

Uploaded by

You might also like