Machine Learning for High Performance Computing Applications

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The focus of this study was to apply state-of-the-art Machine Learning (ML) techniques to problems in the High Performance Computing (HPC) domain. The ML techniques included clustering, various types of regression, a recommendor system, and reinforcement learning using proximal policy optimization. Included are three different advancements applying these techniques. The first application used K-means clustering and Gradient Boosted Tree Regression (GBTR) to predict estimated queue time for jobs submitted to an HPC system. This method achieved a 96% accuracy when predicting whether or not a job would start prior to a specified deadline. The second application focused on optimizing hardware procurement for HPC systems while remaining under a fixed budget. Vendor quotes for new hardware were used with a custom Discrete Event Simulator (DES) to simulate the execution of a job workload on proposed hardware. An Extreme Gradient Boosting (XGBoost) regression model powers a recommendor system that provides a precision@50 of 92%. The third application used Proximal Policy Optimization (PPO) with Invalid Action Masking (IAM) to train a Reinforcement Learning (RL) agent to schedule jobs on a simulated HPC system. The performance of this RL agent was compared to modern scheduling algorithms. The RL agent performed 18.44% better than the algorithmic baselines for one metric and comparably to the baselines for another.

Description

Keywords

High Performance Computing, Machine Learning, Reinforcement Learning, Regression, Artificial Intellegence

Graduation Month

May

Degree

Doctor of Philosophy

Department

Department of Computer Science

Major Professor

Daniel A. Andresen

Date

2024

Type

Dissertation

Citation