Please note: This PhD defence will take place online.
David Radke, PhD candidate
David R. Cheriton School of Computer Science
Supervisors: Professors Kate Larson, Tim Brecht
Across many domains, the ability to work in teams can magnify a group’s abilities beyond the capabilities of any individual. While the science of teamwork is typically studied in organizational psychology (OP) and areas of biology, understanding how multiple agents can work together is an important topic in artificial intelligence (AI) and multiagent systems (MAS). Teams in AI have taken many forms, including ad hoc teamwork [Stone et al., 2010], hierarchical structures of rule-based agents [Tambe, 1997], and teams of multiagent reinforcement learning (MARL) agents [Baker et al., 2020]. Despite significant evidence in the natural world about the impact of family structure on child development and health [Lee et al., 2015; Umberson et al., 2020], the impact of team structure on the policies that individual learning agents develop is not often explicitly studied. In this thesis, we hypothesize that teams can provide significant advantages in guiding the development of policies for individual agents that learn from experience.
We focus on mixed-motive domains, where long-term global welfare is maximized through global cooperation. We present a model of multiagent teams with individual learning agents inspired by OP and early work using teams in AI, and introduce credo, a model that defines how agents optimize their behavior for the goals of various groups they belong to: themselves (a group of one), any teams they belong to, and the entire system. We find that teams help agents develop cooperative policies with agents in other teams despite game-theoretic incentives to defect in various settings that are robust to some amount of selfishness. While previous work assumed that a fully cooperative population (all agents share rewards) obtain the best possible performance in mixed-motive domains [Yang et al., 2020; Gemp et al., 2020], we show that there exist multiple configurations of team structures and credo parameters that achieve about 33% more reward than the fully cooperative system. Agents in these scenarios learn more effective joint policies while maintaining high reward equality. Inspired by these results, we derive theoretical underpinnings that characterize settings where teammates may be beneficial, or not beneficial, for learning. We also propose a preliminary credo-regulating agent architecture to autonomously discover favorable learning conditions in challenging settings.