Please note: This PhD seminar will take place in DC 2585 and online.
Ashish
Gaurav,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Pascal Poupart
Inverse reinforcement learning (IRL) is a growing subfield within reinforcement learning (RL). IRL aims to recover a reward function given access to an optimal policy (typically through demonstrations from an expert), which is the opposite of standard RL, which learns an optimal policy given a reward function. The typical assumption in IRL is that the expert data is generated by an agent optimizing just a reward function. However, in many settings, the agent may optimize a reward function subject to some constraints, where the constraints induce behaviors that may be otherwise difficult to express with just a reward function.
Recovering both the reward and constraint(s) is a difficult problem due to the issue of unidentifiability, therefore, we consider the setting where the reward function is given, and the constraint is unknown, and propose a method that is able to recover the constraint satisfactorily from the expert data. While previous work has focused on recovering constraints in this setting, they usually learn hard constraints. On the other hand, our method can recover cumulative soft constraints that the agent satisfies on average per episode. In IRL fashion, our method solves this problem by adjusting the constraint function iteratively through a constrained optimization procedure, until the agent behavior matches the expert behavior. We demonstrate our approach on synthetic environments, robotics environments and real world highway driving scenarios, and discuss the results and possible implications.
Paper link: https://openreview.net/forum?id=8sSnD78NqTN
To attend this PhD seminar in person, please go to DC 2585. You can also attend using Zoom at https://vectorinstitute.zoom.us/j/83195064721.