Density and partition based clustering on massive threshold bounded data sets

dc.contributor.authorKannamareddy, Aruna Sai
dc.date.accessioned2017-04-21T14:01:26Z
dc.date.available2017-04-21T14:01:26Z
dc.date.graduationmonthMayen_US
dc.date.issued2017-05-01en_US
dc.date.published2017en_US
dc.description.abstractThe project explores the possibility of increasing efficiency in the clusters formed out of massive data sets which are formed using threshold blocking algorithm. Clusters thus formed are denser and qualitative. Clusters that are formed out of individual clustering algorithms alone, do not necessarily eliminate outliers and the clusters generated can be complex, or improperly distributed over the data set. The threshold blocking algorithm, a current research paper from Michael Higgins of Statistics Department on other hand, in comparison with existing algorithms performs better in forming the dense and distinctive units with predefined threshold. Developing a hybridized algorithm by implementing the existing clustering algorithms to re-cluster these units thus formed is part of this project. Clustering on the seeds thus formed from threshold blocking Algorithm, eases the task of clustering to the existing algorithm by eliminating the overhead of worrying about the outliers. Also, the clusters thus generated are more representative of the whole. Also, since the threshold blocking algorithm is proven to be fast and efficient, we now can predict a lot more decisions from large data sets in less time. Predicting the similar songs from Million Song Data Set using such a hybridized algorithm is considered as the data set for the evaluation of this goal.en_US
dc.description.advisorWilliam H. Hsuen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computing and Information Sciencesen_US
dc.description.levelMastersen_US
dc.identifier.urihttp://hdl.handle.net/2097/35467
dc.language.isoen_USen_US
dc.publisherKansas State Universityen
dc.subjectThreshold blockingen_US
dc.subjectClusteringen_US
dc.subjectKmeansen_US
dc.subjectDbscanen_US
dc.subjectHybrid cluster modelen_US
dc.titleDensity and partition based clustering on massive threshold bounded data setsen_US
dc.typeReporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
kannamareddy2017density.pdf
Size:
1.99 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: