0% found this document useful (0 votes)

26 views

Validating Clusters using Hopkins Statistics

The document presents a novel statistical method for validating clusters using the Hopkins statistic, which tests the randomness of data distributions against clustered hypotheses. It discusses the importance of assessing clustering tendency before applying partitioning algorithms and highlights the limitations of existing non-statistical validity indices. The authors demonstrate the effectiveness of their approach through simulations on artificially constructed data sets, showing how the Hopkins statistic can help identify natural groupings in the data.

Uploaded by

David Laredo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Validating Clusters using Hopkins Statistics

Uploaded by

David Laredo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

25-29 July, 2004 Budapest, Hungary

Validating Clusters using; the HoDkins Statistic

U U J.

Amit Banerjee Rajesh N. Dav6

Department of Mechanical Engineering Department of Mechanical Engineering
New Jersey Institute of Technology New Jersey Institute of Technology
Newark, NJ 07102 Newark, NJ 07102
E-mail: [email protected] E-mail: [email protected]

Absfruct - A novel scheme for cluster validity using a test for herently statistical El]. A result is usually tested by building
random position hypothesis is proposed. The random position an altemative hypothesis and comparing it against a null hy-
hypothesis is tested against an alternative clustered hypothesis pothesis, HN. The null hypothesis is a statement of random-
on every cluster produced by a partitioning algorithm. A test ness and could be based on a random graph, a random label
statistic such as the well-known Hopkins statistic could be used
as a basis to accept or reject the random position hypothesis,
or a random position hypothesis. An alterative hypothesis, H,
which is also the null hypothesis in this case. The Hopkins statis- is then a statement of orderliness and captures the intent of
tic is known to be a fair estimator of randomness in a data set. the phrase - “the data are clustered”. The test is then one of
The concept is borrowed from the clustering tendency domain comparing Ho with 1-1, based on the value of some test statis-
and its applicability to validating clusters is shown here using tic T and deciding whether to accept or reject Ho with a cer-
two artificially constructed test data sets. tain degree of certainty. Hubert’s r statistic [2] and the Good-
man-Kruskrrl y statistic 131 are well known examples of test
I. INTRODUCTION statistics used for cluster validity studies. The I7 statistic
compares a clustering structure which is the result of a clus-
This paper examines a vital issue closely related to the
tering scheme, to an a-priori structure, such as one generated
problem of clustering, that of validating results of clustering
by using pre-defined category labels. The y statistic measures
procedures. Clustering attempts to partition a data set into self
the rank correlation between two ordinal sequences of num-
similar groups, called clusters. The literature is abound with
bers, one of which might be derived from the clustering struc-
references to clustering procedures and algorithms on a wide
ture and the other from an a-priori labeling scheme. For a de-
variety of data sets. on the other hand not much attention has tailed discussion on statistical cluster validity statistics, tests
been paid to related issues such as clustering tendency and
and indices, the reader is referred to [I].
cluster validity. Before subjecting a data set to partitioning.
Several non-statistical cluster validity indices and proce-
one needs to be confident whether the data set exhibits a pre- dures were independently developed within the hzzy cluster-
disposition to cluster into natural groups without identifying
ing commiinity to validate partitions obtained using fuzzy
the groups themselves [I] and this is what constitutes the
clustering algorithms. Prominent among these are the parti-
clustering tendency domain. A data set without inherent natu-
tion coefficient [4], classification entropy [5], proportion ex-
ral clusters could be thought of as a random collection of fea-
ponent [6], the uniform data functional [7], nonfuzziness in-
ture vectors and such random data sets should not be subject
dex [SI, information ratio [9], separation ratio [ IO] and per-
to parti tioning. Clustering tendency studies could also give
haps the best known of the validity indices, the Xic-Beni in-
information about the general nature of the data and hence
dex [I 11. These criteria measure the fuzzy overlap between
indicate which clustering procedure to use. if any, on the data
clusters, with or without regard to the geometrical properties
set. Another related issue is the validation of clustering re-
of the data set. Some indices work directly on the fuzzy clus-
sults, in a quantitative and objective manner. Clustering ten-
tering outpub while the others first convert the results to a
dency studies do not give any information about the number hard c-partition before evaluating it. Specialked parlitioning
of natural clusters in the data; it only provides (almost always
schemes such as shell partitioning require the use of special-
subjective) information about the presence of natural group-
ized validity indices such as the partition density and the shell
ings. Furthermore any partitioning algorithm would find any
thickness measure [12]. More recently advances have been
pre-specified number of clusters (ranging from 2 to n-1,
made in visual assessment of clustering using intensity dis-
where n is the number of patterns to be classified). irrespec-
plays 113.141. Some indices are known to function poorly
tive of the actual number of natural groups present. The best
across a wide range of data sets while others are specifically
and the most natural partition is the one that captures the in-
suited to a particular type of data set. In other words their de-
herent natural grouping in the data, and cluster validity aims
pendence on data sets and on the type of clustering scheme
at finding the most natural partition among the many parti-
employed, seriously hinder the practical usage of fuzzy valid-
tions generated by the clustering scheme. The issue of un-
ity indices.
knuwn nzmmber oj. cliisters also falls under the purview of
In this paper, we borrow a statistical concept from the
cluster validity studies.
field of clustering tendency and show its applicability to vali-
Statistical measures used to validate clustering results are
date clustering results generated in a partitioned data.
based on the premise that problems of cluster validity are in-

149
FUZZ-IEEE 2004

11. SPARSL SAMPLING TESTS AND THE HOPKTNS STATISTIC the average. be the same as the interpattem nearest neighbor
distances, implying randomness and hence H should be about
The problem of testing for clustering tendency can also 0.5. However when the pattems are aggregated or clustered
be described as problem of testing for spatial randomness. into clouds, the sampling origin to pattern nearest neighbor
Unlike statistic based cluster validity measures, a test for distances should, on the average, be larger than the randomly
clustering tendency is stated in terms of an intemal criterion selected interpattem nearest neighbor distances. so H should
and no a-priori information is brought into the analysis [l]. be larger than 0.5. almost equal to 1.0 for very well defined
The null hypothesis in most cases is a random position hy- clustered data. By the same reasoning, if is supposed to be
pothesis, such as, much less than 0.5 for regularly spaced data, data that are nei-
Ho: The patterns are generated by a Poisson process with ther clustered nor random. To ensure that no pattern is the
an intensity of L pattems per unit volume. neighbor of more than one sampling origin, m is chosen to be
Under Ho, the number of patterns falling in a region of substantially less than n; it has been suggested that m < 0.1 n
volume V has a Poisson distribution with mean L V and since [17]. With such a condition, it can be ensured that all 2m
L is constant and the numbers of pattems falling in disjoint nearest neighbor distances are statistically independent and H
regions of V are independent random variables, the Poisson has a beta distribution with parameters (m. m), independent of
process is a reasonable model for randomness (absence of both the intensity L and the dimensionality of the data set d.
structure). Sparse sampling tests have been shown to have The distribution and the density function of each of the tenns
high power against clustered alternative hypotheses. On the in (1) are also known, the individual sums each have a
other hand, tests based on small interpattem distances (such gamma distribution (assuming again that the nearest neighbor
as nearest neighbor distance tests) have low power against distances are all independent random variables). Our studies
clustered alternatives primarily because such tests depend on random data sets, clustered data sets and regularly spaced
heavily on the intensity L of the Poisson process assumed un- data sets show that tlopkins slatistic H consistently has a
der Ho. Other tests for spatial randomness include Scan tests value of around 0.5,0.7-0.99 and 0.01-0.3 respectively and is
[15], Quadrat analysis [I61 and Second moment structure hence a powerful estimator of randomness.
tests [ 171.
Sparse sampling tests are based on sampling origins ran- 111. RANDOMPOSITION TESTS FOR CLUSTER VALIDITY
domly identified in a sampling window. Several tests involv-
ing sampling origins have been proposed in literature. based A cluster is characterized by two main properlies -- com-
on a multitude of test statistics such as the Hopkins [18], Hol- pactness and isolation [l]. A natural cluster is ionr.sual(ri
gate [19.20], T-square [21], Eberhardt [22] and the Cox- compact and titn~siiallyisolated. A clustered data set is or-
Lewis statistic [23]. These statistics have been compared but dered because of the presence of natural clusters; in the ab-
little is known about any one of them outperforming the oth- sence of natural groups, it is a r'andom collection of data
ers. The Hopkins statistic is easy to use and comprehend and points. approximating a Poisson process distribution. In this
has been shown to be as good as the Holgate statistic [24]. section we show the applicability of the random position hy-
Let X = {x, 1 i=l to n ) is a collection of n patterns in a d- pothesis and use the Hopkins statistic as a measure for cluster
dimensional space such that s,= {.xlI. -U,., .., .v,d). Also, let Y = validity. Suppose a data set with 3 compact and isolated clus-
Cv, ij=l to m } be m sampling origins placed at random in the ters as shown in fig. 1, is subject to partitioning. At c - 2 ,
d-dimensioned sampling window, m << n. A sampling win- where c is the number of clusters to be found, most partition-
dow can be thought of as a subspace of the entire d- ing schemes would club clusters 11 and 111 together as one
dimensioned sample space. Two types of distances are de- cluster, say cluster A and identify cluster I as an independent
fined: a, as the minimum distance from ,v, to its nearest pat- cluster, B. A random position hypothesis test would lead to
t e n in X and w; as the minimum distance from a randomly the rejection of Ho for cluster A because it yet is a collection
selected pattern in X to its nearest neighbor (m out of the of clustered data points. However it would be difficult to re-
available n patterns are marked at random for this purpose). ject Hn in case of cluster B because it is the natural cluster,
The Hopkins statistic in ddiniensions is defined as, cluster 1. A natza'af clzrster hence apart @om heing isolated
and compact is also random within itseif: Intracluster data
might also exhibit some kind of mutual repulsion as in fig. 1
and in such a case the null hypothesis Ho should include a
statement conforming to random position as well as a non-
random non-clustered (regularly spaced) distribution. The
null hypothesis then can not be rejected if the data is either
random or regularly spaced. However in real world clustering
This statistic compares the nearest-neighbor distribution problems, this is rarely the case and so a random position hy-
of randomly selected locations to that for the randomly se- pothesis alone should suffice. At c = 3 however all the three
lected pattems. Under the null hypothesis, HI),the distances natural clusters are most likely to be identified during parti-
from the sampling origins to their nearest pattems should, on tion and hence it would be difficult to reject H,, for all the

150
25-29 July, 2004 Budapest, Hungary

three clusters identified. The rejection or acceptance of HI) being investigated. In case there were less than 10 pattems
depends on the value ofthe Hopkins statistic. At any value of assigned to a cluster, we choose nt = 1. The value of the Hop-
c > 3, any partitioning algorithm would either subdivide or kins statistic, H for each of the clusters is then calculated and
recombine the clusters that were produced by partitioning at c the process repeated over 5000 runs. The hard-partition is re-
= 3 and hence one might not have reason to reject He for any tained i f H does not fluctuate more than 0.05 over these 5000
of the clusters (in case clusters are subdivided further) or in runs and in case the fluctuation is more than 0.05, the FCM
some cases just enough evidence to reject He (in case clusters partitioning is repeated to get a better resultant hard-partition.
are recombined) for some of the generated clusters. The test The average of these 5000 values constitutes a consolidated
for cluster validity based on the random position hypothesis value of the Hopkins statistic for that particular cluster. The
can hence be stated as follows - accept the lowest value of c average partition Hopkins statistic is then calculated using (2)
at which it is impossible to reject the null hypothesis HOfor and the results plotted against the number of clusters, c.
all the clusters, the test applied one cluster at a time. Let Id, be The 4-cloud data is shown in fig. 2. The resultant aver-
the value of the Hopkins statistic for the ith cluster at a par- age partition Hopkins statistic, If(,,,is plotted against the num-
ticular level of clustering, the average value of the statistic is ber of clusters, c and shown in fig. 3. As can be seen, the null
then, hypothesis cannot be accepted for t = 2 (ffu,,= 0.76) and c = 3
(Hu* = 0.64). However it can be accepted with a fair degree of
1 ' confidence for c = 4 (ff, = 0.47) and thenceforth. Hence ac-
H , , = - E H , ,for fixed c
c , I
cording to the cluster validity criterion enunciated in section
111. one could accept c = 4 as the partition identifying the
A non-rejection of He, as given in section 11, would mean natural groupings in the data, which is indeed the case.
that on an average, the value of H,. is very close to 0.5. Pro-
ceeding from L' = 2 to c = n - 1, the lowest value of c that
generates Ha. 0.5, most likely generates a partition that
identifies the natural clusters in the data.
Iv. SIMULATIONS AND RESULTS
45 b
In this section we present some cluster validity studies
done on two artificially produced two-dimensional data sets,
(1) 4-cloud data comprising of 160 patterns and (2) 7-cloud
data comprising of 1400 patterns. The pattems are generated
within a pre-specified window using the C++ rand function,
which produces pseudo-random numbers using a seed initial-
ized by the CPU clock time. The data sets are partitioned us-
ing FCM [25]. ranging from c = 2 to c = 8 for the 4-cloud
data set and from c = 2 to c = 1 1 fbr the 7-cloud set. t I
. . I1
. .
. I . .
. . . Fig. 3. Thc 160 pattcm 4-cloud data sct (3 clusters of 50 pattcms each
. . . . .I11 and I clustcr of 10 pattcms)
. .
The 7-cioud data set is different from the 4-cloud data in
that the former is not as well separated as the latter. as shown
in fig. 4. The applicability of the Hopkins statistic as a cluster
validity index and the random position hypothesis test as an
Fig. 1 . A thrcc clustcr data set. (a) thc thrcc natural clusters. appropriate test for cluster validity, are illustrated in a much
(b) clusters idcntificd at c = 2. broader sense in this case. The two clusters in the lower left
hand corner of fig. 4 overlap each other and can be argued to
The fuzzy partitions are first converted into hard parti- be one big tilted 8-shaped cluster. This can be seen from the
tions and then all the generated clusters are subject to the plot of the average partition Hopkins statistic, HcI,.versus the
random position hypothesis test. The sampling window in all number of clusters in fig. 5; it is difficult to chose between c
cases encompasses the entire cluster set and the number of = 6 (Ha, = 0.53) and c = 7 (I&,, = 0.48). Other values of c can
sampling origins: m is chosen to be n / I O (or the closest inte- be rejected outright. Hence the Hopkins statistic does reflect
ger value), where n is the number of patterns in the cluster the nuances and subtleties in the data set. In the gbsence of

151
FUZZ-IEEE 2004

the overlap, the average partition Hopkins statistic would rightly rejected Hn for at least one cluster in the range 2 5 c I
have indicated a clear preference for c = 7, suggesting natural 6 and c = 7 would have been the smallest value of c where
grouping at that level of partitioning. one can not reject HOfor any ofthe 7 clusters.

' ! 3 4 5 6 7 8 9 1 0 1 ] 1
Number of Clusters, c

Fig. 5. If,. plotted against c, for the 7-cloud data set

v. DISCUSSION
AND CONCLLISIONS

In this paper, we have demonstrated the applicability of

the random position hypothesis test as a criterion for cluster
validity and used one of the well known test statistics. the
Hopkins statistic as an index for the test. The random position
hypothesis and the Hopkins statistic have been used previ-
ously in the context of clustering tendency. Showing that with
virtually no change in form, the same could be applied to test-
ing for cluster validity. is a novel idea. The idea has been
validated on test cases, both small (160 patterns) and big
(1400 patterns). Generation of sampling windows and sam-
pling origins, locating random patterns to generate nearest-
neighbor distances and the associated calculations are compu-
tationally inexpensive on powerful desktop PCs of the day.
The program to generate fuzzy partitions using FCM, de-
fwzify the results and then test each partition for random-
ness, was written in C++,compiled using a visual C c t com-
piler on a 32-bit windows PC environment. To generate e = 2
Fig. 1.The 1400 pattern 7-cloud data set (200 pattcnis pcr cluster)
to c = 11 partitions on the 7-cloud data set and io get the H,
In case the ciusters are not as well separated as they are value (50,000 runs of the Hopkins statistic code) took ap-
in these two test cases, one might need to accept or reject Ho proximately 2.5s CPU time.
for individual clusters one at a time, instead on making a de- The applicability of the Hopkins statistic and the random
cision based on H',,,. As in the 4-cloud data set, one could re- position hypothesis test for cluster validity to non-cloud data
ject Ho for at least one generated cluster for both c = 2 and c (such as detection of lines in a data set) needs to be investi-
= 3 partitions. However at c = 4, H,, can not be rejected for gated. The key in applying the theory to linearly clustered
any of the 4 generated clusters. For the 7-cloud data set. one data might rest on an appropriate selection of the sampling
could reject Ho for at least one cluster in the range 2 5 c 5 5. window. A skewed line cluster within a rectangular sampling
However it becomes difficult to reject H,, for any of the clus- window might appear to be a clustered (non-random) collec-
ters generated at c = 6 and c = 7. In case there is no overlap of tion of data because a line cluster is neither an isolated nor a
clusters in the lower left hand corner, one could have out compact cluster. It is also impossible to apply the Hopkins

152
2 5 2 9 July, 2004 Budapest, Hungary

statistic on very small data sets where each cluster might con- [SI J. C. Bedck. ”Mathematical models for systematics and taxonomy,” In
Proc. 8“’ Int. Conr A‘rinzrrical Taxonomy. Ed G , Estabrook. San Fran-
tain just 4-5 pattems each. At such intensities of pattem dis-
cisco. CA: Frccmim, pp. 143-166, 1975.
tribution, the theory of randomness does not hold and hence [6] M. P. Windham, “Clustcr validity for fuzzy clustcring algorithms,”
the random position hypothesis is meaningless. However this F u q Sets Svxt., vol. 5(2), pp. 177-185. 1981.
is a blessing in disguise because very small data sets are [7] M. P. Windham. “Cluster validity for fuzzy c-means clustcring algo-
rithm.” IEEE Trans. Puff.And. Machine hiteII., vol. PAM14(4), pp.
rarely encountered in real world situations; in fact it could be 357-363, 1982.
claimed that the random position test for cluster validity pro- [8] G. Libcrt and M. Roubcns, *'New cxpcrimcnhl results in cluster valid-
duces better results as the data set gets larger in size (with in- ity of fuzzy clustering algorithms,” In A h > Trend7 in Data Anal. utid
creasing .). The restriction however being that the data Appl., Eds J. iansscn, J.-I;. Macrotorchino and J.-M. Proth, Amsterdam,
The Ncthcrlands: North Holland. pp. 205-218, 1983.
should be isoluted in groups and compuct within the groups.
191 M. P. Windham, H. Bock and H. F. Walkcr, “Clustcring information
The theory can easily be extended to data in more than two- from convcrgcncc rate,” In Proc. 2”d Cm$ In/. Fedeiwlion Ciussificn-
dimensions because the Hopkins statistic is essentially de- tioir SOC., Washington D.C.. pp. 143, 1989.
fined in d-dimensions. One could even use other statistics, [IO] R. Guiidcrson. ”Application of fuzzy ISODATA algorithms to star
such as the Cox-Lewis statistic [23] which extends better in trackcr pointing systcms.“ In Pro<:.7” Tritcnnird Wodd IFAC Corzg.,
Helsinki, Finland,pp. 1319-1323. 1978.
d-dimensions (d > 2) than the Hopkins statistic. The random [I 11 X. 1.Xic and G. Bcni, “A validity measure for fuzzy clilstcriug,“ IEEE
position hypothesis test for cluster validity using an index Trcns. Pat. .4tiaI. Machine lntdl., vol. PAMI- 13(8). pp. 841 -847,
such as the Hopkins statistic not only tends to provide an an- 1991.
swer to the ever-elusive question - “how inup clusters to [ 121 R. N. Davf, “Validating f k z y partitions obtained through c-shcils clus-
tcring,” fatr. Recog. Lefrers.vol. 17, pp. 613-623, 1996.
find?” but also provides a Lalidation meawre for individual [I31 J. C. Bczdck and R. J. Hathaway. “VAT: A tool for visual asscssmcnt
clusters. If Ho cannot be rejected for any of the clusters at the of (clustcr) tendency.“ In P,ac. I./CXA‘, IEEE Press. Piscataway, NJ,
lowest c = c* partitioning, then one could argue that all the pp. 2225-2230.2002.
clusters found are in fact the true natural clusters in the data. [ 141 J. C. Rczdck and R. J. Hathaway, “Visual cluster validity displays for
prototype gcncrator clustering mcthods,” In Proc. fEEE Int. COCK
Not intended to replace the existing cluster validity tech- FUZZJ~ qvsf. (FUZZ-IEEE 2003/, pp. 87.5-880,2003.
niques and indices, the test for random hypothesis is an inter- [I?] 3. I. Naus, “Approximations for distributioiis of scan statistics,” JASA,
esting and promising addition to the repertoire of cluster va- vol. 17, pp. 177-183, 19x2.
lidity methodologies. [I61 R. Mead. “A test for spatial pattcm at several scales using data from a
grid of contiguous quadrats,” Biomefrics,vol. 30. pp. 295-308, 1974.
Future research might focus on applying the test directly [I71 B. D. Ripley, ‘-Modclling spatial patterns,” J. Royal Srati.c Soc.. B39,
on the fuzzy partition instead of having to defuzzify the c- pp. 172-212, 1977.
partition first. The fuzzy membership function can be used to [ 1x1 B. Hopkins, -.A ncw method of detcmiining thc typc of distribution of
weigh some (or all) of the distances used in the calculation of plant individuals,” Antzui.c ofBoiony, vol. 18, pp. 2 13-226, 1954.
[I 91 P. Holgate, “Some new tcsts o f randomness,” J. E d . . vol. 53, pp. 261-
the Hopkins statistic. The complete partition could then be 266,196.
evaluated at one go, instead of one cluster at a time. 1201 P. Holgatc, “Tcsts of randomness hascd on distance mcanucs;’ Bio-
metrika, vol. 52, pp.34.5-, 1965.
REFERENCES [21] J. E. Besag and J. T. Glcaves, “On thc detection of spatial pattem in
plant communitics,”f?rd/. fnt. Slotis/. insr., vol. 45. pp. 153-1 58, I 973.
[l] A. K. Jain and R. C Dubcs, Algonthnis for Clustcnng Data, Prcnticc [22] L. 1.Ebcrhardt. -'Some dcvclopmcnts in distance sampling,” Bionic~t-
Hall. Englcaood Cliffs, NJ. 1988. rics, vol. 23. pp. 207-2 16, 1967.
r2] L. J. Hubcrt and J. Schultz. ”Quadratic assignment as a general data- 1231 T. F. Cox and T. Lcwis, *’A conditioncd distance ratio mcthod for ana-
analysir strategy,” British .IMath. S/atrst. Psych., vol. 19, pp. 191 -24 l . lyzing spatial pattcms,” Biomenika, vol. 63, pp. 483-491. 1976.
1976. [24] E. Panayirci and K. C. Dubcs, “A t a t for multidimensional clustering
r3] 1. A. Goodman and W H. Kruskal, “Measures for nssociatton for tcndcncy,” Putt. Recog.,vol 16(4), pp. 433-434, 1983.
cross-classifications,” JASA, vol. 49. pp. 732-764, 1954. [251 J. C. Bezdck. Pattcm Recognition with Fuzzy Objective Function Algo-
[4] J. C. Bczdek. “Numcncal taxonomy with fuzzy sets,” ./. Mufk. B i d . , rithms. Plcnum, NY, 1981.
vol. I. pp. 57-71. 1974.

153

Comparing Clustering
No ratings yet
Comparing Clustering
42 pages
Arbelaitz, 2013. Cluster Validity
No ratings yet
Arbelaitz, 2013. Cluster Validity
14 pages
An Optimized Approach On Applying Genetic Algorithm To Adaptive Cluster Validity Index
No ratings yet
An Optimized Approach On Applying Genetic Algorithm To Adaptive Cluster Validity Index
5 pages
A New Index of Cluster Validity: Mu-Chun Su
No ratings yet
A New Index of Cluster Validity: Mu-Chun Su
19 pages
2002 Hakidi Cluster Validity Methods Part II
No ratings yet
2002 Hakidi Cluster Validity Methods Part II
9 pages
Applied Soft Computing: Boseop Kim, Hakyeon Lee, Pilsung Kang
No ratings yet
Applied Soft Computing: Boseop Kim, Hakyeon Lee, Pilsung Kang
15 pages
Density-Based Clustering Validation: April 2014
No ratings yet
Density-Based Clustering Validation: April 2014
10 pages
20-463 Internal and External Validity PDF
No ratings yet
20-463 Internal and External Validity PDF
8 pages
Entropy: A Clustering Method Based On The Maximum Entropy Principle
No ratings yet
Entropy: A Clustering Method Based On The Maximum Entropy Principle
30 pages
Data Mining: Clustering Validation Minimum Description Length Information Theory Co-Clustering
No ratings yet
Data Mining: Clustering Validation Minimum Description Length Information Theory Co-Clustering
67 pages
Comparison of Purity and Entropy of K-Means Clustering and Fuzzy C Means Clustering
No ratings yet
Comparison of Purity and Entropy of K-Means Clustering and Fuzzy C Means Clustering
4 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
58 pages
11.cluster Validation PDF
No ratings yet
11.cluster Validation PDF
37 pages
4.6 Methods For Clustering Validation
No ratings yet
4.6 Methods For Clustering Validation
31 pages
Cluster Validity
No ratings yet
Cluster Validity
18 pages
Cluster Validity Indices
No ratings yet
Cluster Validity Indices
21 pages
A Cluster Validity Index For Fuzzy Clustering
No ratings yet
A Cluster Validity Index For Fuzzy Clustering
17 pages
V61i06 PDF
No ratings yet
V61i06 PDF
36 pages
Journal of Statistical Software: Nbclust: An R Package For Determining The Relevant Number of Clusters in A Data Set
No ratings yet
Journal of Statistical Software: Nbclust: An R Package For Determining The Relevant Number of Clusters in A Data Set
36 pages
Understanding Information Theoretic Measures For Comparing Clusterings
No ratings yet
Understanding Information Theoretic Measures For Comparing Clusterings
18 pages
cluster analysis
No ratings yet
cluster analysis
34 pages
1 s2.0 S0031320305002943 Main
No ratings yet
1 s2.0 S0031320305002943 Main
17 pages
Cluster Validation: Presented By:Rohit Paul
No ratings yet
Cluster Validation: Presented By:Rohit Paul
22 pages
A Comprehensive Survey of Clustering Algorithms
No ratings yet
A Comprehensive Survey of Clustering Algorithms
30 pages
19. Clustering- Introduction^J Evaluation Metrics
No ratings yet
19. Clustering- Introduction^J Evaluation Metrics
19 pages
Shape Invariant Cluster Validity Indices 1st Edition by Greet Frederix, Eric Pauwels 9783540240549 instant download
No ratings yet
Shape Invariant Cluster Validity Indices 1st Edition by Greet Frederix, Eric Pauwels 9783540240549 instant download
19 pages
Clustering Analysis PDF
No ratings yet
Clustering Analysis PDF
15 pages
1 s2.0 S0167865518305579 Main
No ratings yet
1 s2.0 S0167865518305579 Main
8 pages
Cluster Validation
No ratings yet
Cluster Validation
47 pages
1 s2.0 S0031320311005188 Main
No ratings yet
1 s2.0 S0031320311005188 Main
15 pages
The Clustering Validity With Silhouette and Sum of Squared Errors
No ratings yet
The Clustering Validity With Silhouette and Sum of Squared Errors
8 pages
First Paper Before
No ratings yet
First Paper Before
19 pages
Rapidminer_ Data Mining Use Cases and Business Analytics - Hofmann, Markus(Editor);Klinkenberg, Ralf - 2016 - CRC Press - 9781482205503 - 4585e4298994fffc1eeb35862dc68e83 - Anna’s Archive-212
No ratings yet
Rapidminer_ Data Mining Use Cases and Business Analytics - Hofmann, Markus(Editor);Klinkenberg, Ralf - 2016 - CRC Press - 9781482205503 - 4585e4298994fffc1eeb35862dc68e83 - Anna’s Archive-212
26 pages
Class Topology
No ratings yet
Class Topology
15 pages
dataxplore
No ratings yet
dataxplore
34 pages
dm 4
No ratings yet
dm 4
76 pages
Clustering Performance Evaluation Metrics1
No ratings yet
Clustering Performance Evaluation Metrics1
19 pages
Clustering (Introduction, Evaluation Metrics)
No ratings yet
Clustering (Introduction, Evaluation Metrics)
21 pages
(Balasko, Dkk. 2007) Fuzzy Clustering
No ratings yet
(Balasko, Dkk. 2007) Fuzzy Clustering
77 pages
Fuzzy Clustering Toolbox
No ratings yet
Fuzzy Clustering Toolbox
77 pages
Internalmeasures
No ratings yet
Internalmeasures
6 pages
Calinski Harabasz
No ratings yet
Calinski Harabasz
7 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
66 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
Performance Evaluation of Some Clustering Algorithms and Validity Indices
No ratings yet
Performance Evaluation of Some Clustering Algorithms and Validity Indices
5 pages
Algorithm Its Clustering: Detecting
No ratings yet
Algorithm Its Clustering: Detecting
11 pages
What Is A Cluster
No ratings yet
What Is A Cluster
4 pages
main_validity_indices_2
No ratings yet
main_validity_indices_2
63 pages
Shape Invariant Cluster Validity Indices 1st Edition by Greet Frederix, Eric Pauwels 9783540240549 download
No ratings yet
Shape Invariant Cluster Validity Indices 1st Edition by Greet Frederix, Eric Pauwels 9783540240549 download
48 pages
Unit4-Clustering-Evaluation
No ratings yet
Unit4-Clustering-Evaluation
53 pages
Performance Evaluation of Distance Metrics in The Clustering Algorithms
No ratings yet
Performance Evaluation of Distance Metrics in The Clustering Algorithms
14 pages
WS - Data Analytics Fundamental-R
No ratings yet
WS - Data Analytics Fundamental-R
51 pages
Cluster
No ratings yet
Cluster
5 pages
Clustering Quality Paper
No ratings yet
Clustering Quality Paper
8 pages
Model Based Evaluation of Clustering
No ratings yet
Model Based Evaluation of Clustering
18 pages
Determining The Number of Groups From Measures of Cluster Stability
No ratings yet
Determining The Number of Groups From Measures of Cluster Stability
10 pages
FCM - The Fuzzy C-Means Clustering Algorithm
No ratings yet
FCM - The Fuzzy C-Means Clustering Algorithm
13 pages
OPTICS: Ordering Points To Identify The Clustering Structure
No ratings yet
OPTICS: Ordering Points To Identify The Clustering Structure
12 pages
An Efficient GA-based Clustering Technique: Hwei-Jen Lin, Fu-Wen Yang and Yang-Ta Kao
No ratings yet
An Efficient GA-based Clustering Technique: Hwei-Jen Lin, Fu-Wen Yang and Yang-Ta Kao
10 pages
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
From Everand
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
Pasquale De Marco
No ratings yet
Vehicle Accidentand Traffic Classification Using Deep Convolutional Neural Networks
No ratings yet
Vehicle Accidentand Traffic Classification Using Deep Convolutional Neural Networks
7 pages
Mean Deviation
No ratings yet
Mean Deviation
23 pages
Community Project: Simple Linear Regression
No ratings yet
Community Project: Simple Linear Regression
4 pages
Rodriguez Et Al.,2022
100% (1)
Rodriguez Et Al.,2022
7 pages
Introduction of Statistics Part-I
No ratings yet
Introduction of Statistics Part-I
17 pages
Senior Applied Scientist - Job ID - 1049995 CAUSAL INFERENCE - Amazon - Jobs
No ratings yet
Senior Applied Scientist - Job ID - 1049995 CAUSAL INFERENCE - Amazon - Jobs
2 pages
Oos Guidance
No ratings yet
Oos Guidance
48 pages
Introducing Data Science Techniques by Connecting Database Concepts and Dplyr
No ratings yet
Introducing Data Science Techniques by Connecting Database Concepts and Dplyr
8 pages
1.2 Activity Questions 2 - Not Graded
No ratings yet
1.2 Activity Questions 2 - Not Graded
4 pages
Research
No ratings yet
Research
18 pages
PR1
No ratings yet
PR1
26 pages
Example of Statistics Coursework
100% (2)
Example of Statistics Coursework
4 pages
Advanced HR Generalist + Free SAP HCM Practical Training (100% Job Assurance)
No ratings yet
Advanced HR Generalist + Free SAP HCM Practical Training (100% Job Assurance)
10 pages
Kaggle Competitions - How To Win
No ratings yet
Kaggle Competitions - How To Win
74 pages
Data Mining Edited
No ratings yet
Data Mining Edited
7 pages
Midterm Activity - Frequency Distribution Table - Measures of Central Tendency and Location
No ratings yet
Midterm Activity - Frequency Distribution Table - Measures of Central Tendency and Location
5 pages
Prof. Januario Flores JR
No ratings yet
Prof. Januario Flores JR
14 pages
Univariate Time Series Modelling and Forecasting
100% (2)
Univariate Time Series Modelling and Forecasting
72 pages
Forcasting
No ratings yet
Forcasting
86 pages
RGPV EC 7th Syllabus 2011
No ratings yet
RGPV EC 7th Syllabus 2011
13 pages
A Review of Research Process, Data Collection and Analysis: Surya Raj Niraula
No ratings yet
A Review of Research Process, Data Collection and Analysis: Surya Raj Niraula
6 pages
CMSC422 Project Presentation
No ratings yet
CMSC422 Project Presentation
17 pages
Thesis Chapter 4 Results and Discussion
100% (2)
Thesis Chapter 4 Results and Discussion
8 pages
5 SEC - Usman, Britto, Damm & Börstler - Effort Estimation in Large-Scale Software DevelopmentAn Industrial Case Study
No ratings yet
5 SEC - Usman, Britto, Damm & Börstler - Effort Estimation in Large-Scale Software DevelopmentAn Industrial Case Study
30 pages
Course Outline Mfs 7104 QT For Finance
No ratings yet
Course Outline Mfs 7104 QT For Finance
2 pages
Importance of M & e
No ratings yet
Importance of M & e
17 pages
Unit 3_Statistical Measures
No ratings yet
Unit 3_Statistical Measures
34 pages
Playing Mobile Games and Behavior of Select Grade 11 Students in Our Lady of Caysasay Academy
No ratings yet
Playing Mobile Games and Behavior of Select Grade 11 Students in Our Lady of Caysasay Academy
61 pages
KEY ASPECTS OF ANALYTICAL METHOD VALIDATION AND LINEARITY EVALUATION (Araujo 2009) PDF
No ratings yet
KEY ASPECTS OF ANALYTICAL METHOD VALIDATION AND LINEARITY EVALUATION (Araujo 2009) PDF
11 pages
Crosstabulation A Tutorial by Russell K. Schutt To Accompany
No ratings yet
Crosstabulation A Tutorial by Russell K. Schutt To Accompany
8 pages

Uploaded by

Uploaded by

25-29 July, 2004 Budapest, Hungary

Validating Clusters using; the HoDkins Statistic

Amit Banerjee Rajesh N. Dav6

Fig. 5. If,. plotted against c, for the 7-cloud data set

In this paper, we have demonstrated the applicability of

You might also like