Evaluation of performance: multi-armed bandit vs. contextual bandit

Date

2019-12-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This work compares two methods, the multi-armed bandit (MAB) and contextual multi-armed bandit (CMAB), for action recommendation in a sequential decision making domain. It empirically evaluates their effectiveness on a customer relationship management task. The goal of this project is to experiment using [epsilon]-greedy and random selection strategies to characterize the exploration vs. exploitation tradeoff , which manifests when trying to increase or maximize profit while gaining new information regarding the process. The first method under observation, the multi-armed bandit (MAB), is simpler to compute and scales better to larger amounts of data; it has a wide range of applicability, including website optimization, clinical trials, adaptive routing, and stock trading. The contextual multi-armed bandit (CMAB) is an advanced version of the multi-armed bandit which takes into consideration the user’s past usage patterns, especially historical features of the user’s search history; its training data incorporates this context, resulting in a model that is more accurate but also requires a lot of user data which incurs privacy liabilities, an adverse property. This study measures the difference in outcome if the MAB or CMAB have access to user data and assesses, for a real-world application domain, whether this trade-off is significant and worthwhile in the bigger prospective.

Description

Keywords

Multi-armed bandit

Graduation Month

December

Degree

Master of Science

Department

Department of Computer Science

Major Professor

William H. Hsu

Date

2019

Type

Thesis

Citation