Machine learning for high performance computing applications

Hutchison, Scott

Machine learning for high performance computing applications

Files

ScottHutchison2024.pdf (1.47 MB)

Date

2024

Authors

Hutchison, Scott

Publisher

Kansas State University

Abstract

The focus of this study was to apply state-of-the-art Machine Learning (ML) techniques to problems in the High Performance Computing (HPC) domain. The ML techniques included clustering, various types of regression, a recommendor system, and reinforcement learning using proximal policy optimization. Included are three different advancements applying these techniques. The first application used K-means clustering and Gradient Boosted Tree Regression (GBTR) to predict estimated queue time for jobs submitted to an HPC system. This method achieved a 96% accuracy when predicting whether or not a job would start prior to a specified deadline. The second application focused on optimizing hardware procurement for HPC systems while remaining under a fixed budget. Vendor quotes for new hardware were used with a custom Discrete Event Simulator (DES) to simulate the execution of a job workload on proposed hardware. An Extreme Gradient Boosting (XGBoost) regression model powers a recommendor system that provides a precision@50 of 92%. The third application used Proximal Policy Optimization (PPO) with Invalid Action Masking (IAM) to train a Reinforcement Learning (RL) agent to schedule jobs on a simulated HPC system. The performance of this RL agent was compared to modern scheduling algorithms. The RL agent performed 18.44% better than the algorithmic baselines for one metric and comparably to the baselines for another.

Keywords

High performance computing, Machine learning, Reinforcement learning, Regression, Artificial intellegence

Graduation Month

May

Degree

Doctor of Philosophy

Department

Department of Computer Science

Major Professor

Daniel A. Andresen

Type

Dissertation

URI

https://hdl.handle.net/2097/44307

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -

Full item page

Machine learning for high performance computing applications

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Graduation Month

Degree

Department

Major Professor

Date

Type

Citation

URI

Collections