Please note: This PhD seminar will take place in DC 2584 and online.
William Loh, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Pascal Poupart
Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider set of applications such as recommendation tasks.
To solve this problem, we introduce conditionally coupled contextual (C3) Thompson sampling for Bernoulli bandits. It combines an improved Nadaraya-Watson estimator on an embedding space with Thompson sampling that allows online learning without retraining. Empirical results show that C3 outperforms the next best algorithm by 5.7% lower average cumulative regret on four OpenML tabular datasets as well as demonstrating a 12.4% click lift on Microsoft News Dataset (MIND) compared to other algorithms.
To attend this PhD seminar in person, please go to DC 2584. You can also attend virtually on MS Teams.