Please note: This PhD seminar will take place online.
Ashish Gaurav, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Pascal Poupart
Learning a constraint given demonstrations and a reward function (i.e. ICL) can be challenging due to the high dimensionality of the state-action space. Constraint learning in such a setup alternates between policy optimization and constraint adjustment, starting from a randomly initialized constraint. For this formulation, the features input to the constraint function being learned is known in advance. However, in many cases, it is not possible to specify a priori the input space of the constraint function simply because the features relevant for predicting the constraint function output are not known.
In this work, we propose using a simple test from hypothesis testing literature to determine the order of feature relevance for the constraint function output (i.e. feature selection). We also discuss the implications of using our approach in conjunction with an ICL algorithm. We compare against prior algorithms which use mutual information for feature selection, and validate our approach by considering environments with varying state-action space sizes.