A hybrid approach to clustering optimization

Hayes, Dustin

A hybrid approach to clustering optimization

Files

DustinHayes2024.pdf (529.25 KB)

Date

2024

Authors

Hayes, Dustin

Publisher

Kansas State University

Abstract

A known limitation of many clustering algorithms is their inability to guarantee convergence to global optimality. The K-means clustering algorithm, as a representative example, given some data x and some objective function L which we aim to optimize, only ensures that clustering assignments are found such that L is optimized locally. Consequently, one can not generally know after one application of the K-means clustering algorithm if the globally optimal or an inferior locally optimal solution was found. This limitation is not exclusive to the K-means algorithm; other algorithms, such as the EM algorithm for Gaussian mixture models, are also liable to converge to a sub-optimal solution. In practice, one can somewhat avoid this limitation by running their clustering algorithm of choice a number of times and selecting the best solution found. However, this approach is uncertain by nature. At no point would the practitioner know whether or not the optimal solution had been found or if more attempts were required. A single algorithm which offers a higher degree of assurance that the global optimum was reached would be preferable. In this paper we explore existing clustering methods, discuss their limitations, and present a novel clustering algorithm, with implementation in R, that achieves a higher assurance of global optimization than traditional clustering methods. By strategically passing the optimization problem between an EM algorithm and a Gibbs sampler, our algorithm takes advantage of both local optimization and global search methods to explore the complex loss landscape of the objective function, ensuring more reliable convergence to optimal solutions.

Keywords

Clustering, Optimization, Bayesian, Machine learning

Graduation Month

August

Degree

Master of Science

Department

Department of Statistics

Major Professor

Gyuhyeong Goh

Type

Report

URI

https://hdl.handle.net/2097/44123

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -

Full item page

A hybrid approach to clustering optimization

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Graduation Month

Degree

Department

Major Professor

Date

Type

Citation

URI

Collections