Comparing different machine learning methods on NFL defensive player performance prediction

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Kansas State University

Abstract

This study examines the effectiveness of various machine learning techniques in predicting defensive player performance in the National Football League (NFL). The dataset from Kaggle's "NFL BIG DATA BOWL 2024" includes numerous variables related to games, players, plays, and tackle details for NFL players. Six datasets were created by pairing each of the three subsets of predictive variables with either of the two response variables, 'attack' or 'tackle'. These datasets were analyzed using multiple machine learning methods. The research aims to provide a comprehensive analysis of defensive performance by incorporating a broad array of algorithms, including Logistic Regression, Random Forest, K-Nearest Neighbors (KNN), AdaBoost, Support Vector Machine (SVM), Decision Tree, and XGBoost. Hyperparameter tuning was employed to optimize each model's performance. Among the classifiers, Logistic Regression, XGBoost and AdaBoost consistently achieved the highest performance metrics, demonstrating their robustness and effectiveness in predicting NFL defensive performance. Decision Tree, SVM, and Random Forest have slightly lower AUC than the top three, but the difference is not statistically significant. KNN has lowest performance and the difference from the top three is significant even though the difference with the middle three classifiers is not significant. The analysis revealed that variables such as 'player_bmi', 'age', 'tackleSuccessRate', 'assistSuccessRate', 'forcedFumble', 'assist', and 'quarter' offer the most predictive power for both tackle and attack outcomes across all classifiers. This comprehensive evaluation underscores the importance of feature selection and hyperparameter tuning in optimizing machine learning models for sports analytics.

Description

Keywords

AUC, XGBoost, Prediction NFL defensive performance, Machine learning, Logistic Regression, Random Forest

Graduation Month

August

Degree

Master of Science

Department

Department of Statistics

Major Professor

Haiyan Wang

Date

Type

Report

Citation