A Mathematical Framework for Improving March Madness Models
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Every year, millions of people attempt to predict the outcomes of the NCAA Division I Men’s Basketball Tournament (“March Madness”), often turning to models or team rankings to help inform their decisions. We propose a mathematical framework capable of taking any model that can return a probability that some team i beats some opponent j and returning round-specific win probabilities and expected wins. We illustrate our framework by fitting models to 24 years of historical tournament data using the differences in the teams’ stats, and find they all outperform a traditional seed-based “chalk” model. Two of our best models include an XGBoost model with a median accuracy and total points of 66.67% and 116.5 respectively and an Elastic-Net model with a median accuracy and total points of 73.02% and 115 respectively. We also rank teams using expected wins from the XGBoost model, and find that 22 of the past 25 champions ranked within our top 4, with 11 ranked first. Additionally, we apply our framework to a knapsack bracket competition, which involves optimizing team selection under a cost constraint. The solutions from an Elastic-Net model average nearly 23 wins and capture almost 70% of the maximum achievable win total.