Machine Learning for High Performance Computing Applications
dc.contributor.author | Hutchison, Scott | |
dc.date.accessioned | 2024-04-15T19:01:52Z | |
dc.date.available | 2024-04-15T19:01:52Z | |
dc.date.graduationmonth | May | |
dc.date.published | 2024 | |
dc.description.abstract | The focus of this study was to apply state-of-the-art Machine Learning (ML) techniques to problems in the High Performance Computing (HPC) domain. The ML techniques included clustering, various types of regression, a recommendor system, and reinforcement learning using proximal policy optimization. Included are three different advancements applying these techniques. The first application used K-means clustering and Gradient Boosted Tree Regression (GBTR) to predict estimated queue time for jobs submitted to an HPC system. This method achieved a 96% accuracy when predicting whether or not a job would start prior to a specified deadline. The second application focused on optimizing hardware procurement for HPC systems while remaining under a fixed budget. Vendor quotes for new hardware were used with a custom Discrete Event Simulator (DES) to simulate the execution of a job workload on proposed hardware. An Extreme Gradient Boosting (XGBoost) regression model powers a recommendor system that provides a precision@50 of 92%. The third application used Proximal Policy Optimization (PPO) with Invalid Action Masking (IAM) to train a Reinforcement Learning (RL) agent to schedule jobs on a simulated HPC system. The performance of this RL agent was compared to modern scheduling algorithms. The RL agent performed 18.44% better than the algorithmic baselines for one metric and comparably to the baselines for another. | |
dc.description.advisor | Daniel A. Andresen | |
dc.description.degree | Doctor of Philosophy | |
dc.description.department | Department of Computer Science | |
dc.description.level | Doctoral | |
dc.identifier.uri | https://hdl.handle.net/2097/44307 | |
dc.language.iso | en_US | |
dc.subject | High Performance Computing | |
dc.subject | Machine Learning | |
dc.subject | Reinforcement Learning | |
dc.subject | Regression | |
dc.subject | Artificial Intellegence | |
dc.title | Machine Learning for High Performance Computing Applications | |
dc.type | Dissertation |