PhD Seminar • Machine Learning | Multi-armed Bandits • A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems | Cheriton School of Computer Science

Friday, March 13, 2026 3:00 pm - 4:00 pm EDT (GMT -04:00)

Please note: This PhD seminar will take place in DC 2584 and online.

William Loh, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Pascal Poupart

Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider set of applications such as recommendation tasks.

To solve this problem, we introduce conditionally coupled contextual (C3) Thompson sampling for Bernoulli bandits. It combines an improved Nadaraya-Watson estimator on an embedding space with Thompson sampling that allows online learning without retraining. Empirical results show that C3 outperforms the next best algorithm by 5.7% lower average cumulative regret on four OpenML tabular datasets as well as demonstrating a 12.4% click lift on Microsoft News Dataset (MIND) compared to other algorithms.

To attend this PhD seminar in person, please go to DC 2584. You can also attend virtually on MS Teams.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
Hybrid: DC 2584 | Online PhD seminar
Waterloo, ON, CA N2L 3G1

Location coordinates: