How Good is Forced-Exploration in Linear Stochastic Bandits?
Invited Talk
Csaba Szepesvari (University of
Alberta)
Abstract:
Assume that in a stochastic bandit problem the payoff is a linear
function of the actions. What is a good algorithm for this class
of problems? In this talk we will investigate an algorithm that
randomly explores the action space for some number of time steps,
otherwise it plays the greedy action. It turns out that for this
class of problems this algorithm is not as bad as one might think
first. The theory will be illustrated by some numerical
experiments on the ad allocation problem.