PhD Defence • Machine Learning | Reinforcement Learning • Techniques to Learn Constraints from Demonstrations | Cheriton School of Computer Science

Please note: This PhD defence will take place online.

Ashish Gaurav, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Pascal Poupart

Given demonstrations from an optimal expert, inverse reinforcement learning aims to learn an underlying reward function. However, it is limiting to assume that the reward function fully explains the expert behaviour, since in many real world settings the expert might be acting to satisfy additional behavioural constraints. Recovering these additional constraints falls within the paradigm of constraint learning from demonstrations. Specifically, in this work, we focus on the setting of inverse constraint learning (ICL), where we wish to learn a single but arbitrarily complex constraint from demonstrations assuming the reward is known in advance.

For this setting, we first provide a framework to learn an expected constraint from constrained expert demonstrations. We then show how to translate an expected constraint into a probabilistic constraint and additionally extend the proposed framework to learn a probabilistic constraint from constrained expert demonstrations. Here, an expected constraint refers to a constraint that bounds the cumulative costs averaged over a batch of trajectories to be within a budget. Similarly, a probabilistic constraint upper bounds the probability that cumulative costs are above a certain threshold. Finally, we provide convergence guarantees for the proposed frameworks.

Following these approaches, we consider the complementary challenge of learning a constraint in a high dimensional state-action space. In such a setting, the constraint function may truly depend only on a subset of the input features. We propose using a simple test from the hypothesis testing literature to select this subset of features in order to construct a reduced input space for the constraint function. We also discuss the implications of using this approach in conjunction with an ICL algorithm.

To validate our proposed approaches, we conduct experiments with synthetic, robotics and real-world environments. For feature selection, we test our approach by considering environments with varying state-action space sizes.

Attend this PhD defence virtually on Zoom.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
Online PhD defence
Waterloo, ON, CA N2L 3G1

Location coordinates: