Evaluation of optimal clusterings found by cluster validation measures

dc.contributor.authorShi, Yu
dc.date.accessioned2020-08-03T21:35:30Z
dc.date.available2020-08-03T21:35:30Z
dc.date.graduationmonthAugust
dc.date.issued2020-08-01
dc.description.abstractThere are many measures developed for assessing clustering algorithms. However, little work has been done to determine what type of clusterings these validation measures would consider ``the best.'' In particular, if a clustering validation measure performs well, then it should be able to identify the ``correct'' clustering when when presented with all possible ways of clustering a dataset. We evaluate the performance of five clustering validation measures---Silhouette, Hubert-Gamma, R-squared, the Dunn family of indices, and the data Davies-Bouldin index---on five small clustered datasets. To obtain a large set of candidate clusterings, we view each dataset as a graph and form a connected bottleneck subgraph. On this subgraph, we identify all set-connected partitions---those whose blocks are connected---that satisfy a set of constraints on the number of blocks and the size of each block within the partition. We then apply the validation measure on each of the possible partitions to determine the clustering that each validation measure considers to be optimal. Based on test results, we find each measure has its own preferences. For example, the silhouette measure tends to be better at capturing connected regions, and many others measures prefer clusterings that contain many clusters. Finally, we compare the clusterings found by the validation measures to those obtained by other popular clustering methods including k-means, hierarchical agglomerative clustering (HAC), density-based spatial clustering of applications with noise (DBSCAN) and ordering points to identify the clustering structure (OPTICS).
dc.description.advisorMichael J. Higgins
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Statistics
dc.description.levelMasters
dc.identifier.urihttps://hdl.handle.net/2097/40778
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectClustering
dc.subjectCluster validation measures
dc.titleEvaluation of optimal clusterings found by cluster validation measures
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YuShi2020.pdf
Size:
501.84 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: