Optimizing high performance computing system’s, resource utilization and
throughput by leveraging machine learning

Dunn, Brandon

Optimizing high performance computing system’s, resource utilization and throughput by leveraging machine learning

Files

BrandonDunn2021.pdf (621.17 KB)

Authors

Dunn, Brandon

Abstract

High Performance Computing (HPC) facilitates a significant portion of research and analytics across many different fields, industries, and education. HPC is implemented using supercomputers, which can be comprised of a few servers to tens to thousands. HPC systems typically use a scheduler - such as Slurm - to manage the execution of tasks on the system. Schedulers typically have hundreds of configuration parameters. With such diverse workflows and hardware the question becomes: how do we adapt these HPC schedulers so that we keep a high utilization and throughput on the systems? Our research focuses on optimizing the SLURM scheduler by adapting its configuration options based on the type of hardware in the High Performance Computing system and types of workflows, utilizing Semi-supervised Machine Learning.

Keywords

SLURM, HPC, Machine learning

Graduation Month

May

Degree

Master of Science

Department

Department of Computer Science

Major Professor

Daniel A. Andresen

Date

2021

Type

Thesis

URI

https://hdl.handle.net/2097/41685

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -

Full item page

Optimizing high performance computing system’s, resource utilization and throughput by leveraging machine learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Graduation Month

Degree

Department

Major Professor

Date

Type

Citation

URI

Collections