A Mathematical Framework for Improving March Madness Models

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Every year, millions of people attempt to predict the outcomes of the NCAA Division I Men’s Basketball Tournament (“March Madness”), often turning to models or team rankings to help inform their decisions. We propose a mathematical framework capable of taking any model that can return a probability that some team i beats some opponent j and returning round-specific win probabilities and expected wins. We illustrate our framework by fitting models to 24 years of historical tournament data using the differences in the teams’ stats, and find they all outperform a traditional seed-based “chalk” model. Two of our best models include an XGBoost model with a median accuracy and total points of 66.67% and 116.5 respectively and an Elastic-Net model with a median accuracy and total points of 73.02% and 115 respectively. We also rank teams using expected wins from the XGBoost model, and find that 22 of the past 25 champions ranked within our top 4, with 11 ranked first. Additionally, we apply our framework to a knapsack bracket competition, which involves optimizing team selection under a cost constraint. The solutions from an Elastic-Net model average nearly 23 wins and capture almost 70% of the maximum achievable win total.

Description

Keywords

March Madness, Predictive modeling, Knapsack Problem, Sports Analytics

Graduation Month

May

Degree

Master of Science

Department

Department of Statistics

Major Professor

Michael J. Higgins

Date

Type

Thesis

Citation