A hybrid approach to clustering optimization
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A known limitation of many clustering algorithms is their inability to guarantee convergence to global optimality. The K-means clustering algorithm, as a representative example, given some data x and some objective function L which we aim to optimize, only ensures that clustering assignments are found such that L is optimized locally. Consequently, one can not generally know after one application of the K-means clustering algorithm if the globally optimal or an inferior locally optimal solution was found. This limitation is not exclusive to the K-means algorithm; other algorithms, such as the EM algorithm for Gaussian mixture models, are also liable to converge to a sub-optimal solution. In practice, one can somewhat avoid this limitation by running their clustering algorithm of choice a number of times and selecting the best solution found. However, this approach is uncertain by nature. At no point would the practitioner know whether or not the optimal solution had been found or if more attempts were required. A single algorithm which offers a higher degree of assurance that the global optimum was reached would be preferable. In this paper we explore existing clustering methods, discuss their limitations, and present a novel clustering algorithm, with implementation in R, that achieves a higher assurance of global optimization than traditional clustering methods. By strategically passing the optimization problem between an EM algorithm and a Gibbs sampler, our algorithm takes advantage of both local optimization and global search methods to explore the complex loss landscape of the objective function, ensuring more reliable convergence to optimal solutions.